Re: Exporting Score value from export handler
Hi Joel, I saw your response this morning, and have created an issue, SOLR-8664, and linked it to SOLR-8125. As context, I included my original question and your answer, as a comment. Cheers Akiel From: Joel Bernstein To: solr-user@lucene.apache.org Date: 29/01/2016 13:46 Subject:Re: Exporting Score value from export handler Exporting scores would be a great feature to have. I don't believe it will add too much complexity to export and sort by score. The main consideration has been memory consumption for every large export sets. The export feature powers SQL queries that are unlimited in Solr 6. So adding scores to export would support queries like: select id, title, score from tableX where a = '(a query)' Where currently you can only do this: select id, title, score from tableX where a = '(a query)' limit 1000 Can you create a jira for this and link it to SOLR-8125. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Jan 29, 2016 at 8:26 AM, Akiel Ahmed wrote: > Hi, > > I would like to issue a query and get the ID and Score for each matching > document. There may be lots of results so I wanted to use the export > handler, but unfortunately the current version of Solr doesn't seem to > export the Score - I read the comments on > https://issues.apache.org/jira/browse/SOLR-5244 (Exporting Full Sorted > Result Sets) but am not sure what happened with the idea of exporting the > Score. Does anybody know of an existing or future version where this can > be done? > > I compared exporting 100,000 IDs via the export handler with getting > 100,000 ID,Score pairs using the cursor mark - exporting 100,000 IDs was > an order of magnitude faster on my laptop. Does anybody know of a faster > way to retrieve the ID,Score pairs for a query on a SolrScloud deployment > and/or have an idea on the possible performance characteristics of > exporting ID, Score (without ranking) if it was to be implemented? > > Cheers > > Akiel > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU > Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Re: online scoring explanation
that's it! and doug is the one from back in the day :) thanks guys -- *John Blythe* Product Manager & Lead Developer 251.605.3071 | j...@curvolabs.com www.curvolabs.com 58 Adams Ave Evansville, IN 47713 On Mon, Feb 8, 2016 at 3:08 PM, Toke Eskildsen wrote: > John Blythe wrote: > > last year i had gotten a site recommended to me on this forum. it helped > > you break down the results/score you were getting from your queries. > > http://splainer.io/ perhaps? > > - Toke Eskildsen >
Re: online scoring explanation
Hi, I did a chrome extension: https://chrome.google.com/webstore/detail/solr-query-debugger/gmpkeiamnmccifccnbfljffkcnacmmdl Hope this helps, Vincenzo On Tue, Feb 9, 2016 at 11:39 AM, John Blythe wrote: > that's it! > > and doug is the one from back in the day :) > > thanks guys > > -- > *John Blythe* > Product Manager & Lead Developer > > 251.605.3071 | j...@curvolabs.com > www.curvolabs.com > > 58 Adams Ave > Evansville, IN 47713 > > On Mon, Feb 8, 2016 at 3:08 PM, Toke Eskildsen > wrote: > > > John Blythe wrote: > > > last year i had gotten a site recommended to me on this forum. it > helped > > > you break down the results/score you were getting from your queries. > > > > http://splainer.io/ perhaps? > > > > - Toke Eskildsen > > > -- Vincenzo D'Amore email: v.dam...@gmail.com skype: free.dev mobile: +39 349 8513251
Re: online scoring explanation
amazing, thanks! -- *John Blythe* Product Manager & Lead Developer 251.605.3071 | j...@curvolabs.com www.curvolabs.com 58 Adams Ave Evansville, IN 47713 On Tue, Feb 9, 2016 at 6:04 AM, Vincenzo D'Amore wrote: > Hi, > > I did a chrome extension: > > > https://chrome.google.com/webstore/detail/solr-query-debugger/gmpkeiamnmccifccnbfljffkcnacmmdl > > > Hope this helps, > Vincenzo > > > On Tue, Feb 9, 2016 at 11:39 AM, John Blythe wrote: > > > that's it! > > > > and doug is the one from back in the day :) > > > > thanks guys > > > > -- > > *John Blythe* > > Product Manager & Lead Developer > > > > 251.605.3071 | j...@curvolabs.com > > www.curvolabs.com > > > > 58 Adams Ave > > Evansville, IN 47713 > > > > On Mon, Feb 8, 2016 at 3:08 PM, Toke Eskildsen > > wrote: > > > > > John Blythe wrote: > > > > last year i had gotten a site recommended to me on this forum. it > > helped > > > > you break down the results/score you were getting from your queries. > > > > > > http://splainer.io/ perhaps? > > > > > > - Toke Eskildsen > > > > > > > > > -- > Vincenzo D'Amore > email: v.dam...@gmail.com > skype: free.dev > mobile: +39 349 8513251 >
Custom JSON facet functions
Hi - i must have missing something but is it possible to declare custom JSON facet functions in solrconfig.xml? Just like we would do with request handlers or search components? Thanks, Markus
Re: Solr 4.10 with Jetty 8.1.10 & Tomcat 7
Shahzad - I am curious what features of distributed search stops you to run SolrCloud. Using DS, you would be able to search across cores or collections. https://cwiki.apache.org/confluence/display/solr/Advanced+Distributed+Request+Options Thanks, Susheel On Tue, Feb 9, 2016 at 12:10 AM, Shahzad Masud < shahzad.ma...@northbaysolutions.net> wrote: > Thank you Shawn for your response. I would be running some performance > tests lately on this structure (one JVM with multiple cores), and would > share feedback on this thread. > > >There IS a way to specify the solr home for a specific context, but keep > >in mind that I definitely DO NOT recommend doing this. There is > >resource and administrative overhead to running multiple copies of Solr > >in one JVM. Simply run one context and let it handle multiple shards, > >whether you choose SolrCloud or not. > Due to distributed search feature, I might not be able to run SolrCloud. I > would appreciate, if you please share that way of setting solr home for a > specific context in Jetty-Solr. Its good to seek more information for > comparison purposes. Do you think having multiple JVMs would increase or > decrease performance. My document base is around 20 million rows (in 24 > shards), with document size ranging from 100KB - 400 MB. > > SM > > On Mon, Feb 8, 2016 at 8:09 PM, Shawn Heisey wrote: > > > On 2/8/2016 1:14 AM, Shahzad Masud wrote: > > > Thank you Shawn for your reply. Here is my structure of cores and > shards > > > > > > Shard 1 = localhost:8983/solr_2014 [3 Core - Employee, Service > Tickets, > > > Departments] > > > Shard 2 = localhost:8983/solr_2015 [3 Core - Employee, Service > Tickets, > > > Departments] > > > Shard 3 = localhost:8983/solr_2016 [3 Core - Employee, Service > Tickets, > > > Departments] > > > > > > While searching, I use distributed search feature to search data from > all > > > three shards in respective cores e.g. If I want to search from Employee > > > data for all three years, I search from Employee core of three > contexts. > > > This is legacy design, do you think this is okay, or this require > > immediate > > > restructure / design? I am going to try this, > > > > > > Context = localhost:8982/solr (9 cores - Employee-2014, Employee-2015, > > > Employee-2016, ServiceTickets-2014, ServiceTickets-2015, > > > ServiceTickets-2016, Department-2014, Department-2015, Department-2016] > > > distributed search would be from all three cores of same data category > > > (i.e. For Employee search, it would be from Employee-2014, > Employee-2015, > > > Employee-2016). > > > > With SolrCloud, you can have multiple collections for each of these > > types and alias them together. Or you can simply have one collection > > for employee, one for servicetickets, and one for department, with > > SolrCloud automatically handling splitting those documents into the > > number of shardsthat you specify when you create the collection. You > > can also do manual sharding and split each collection on a time basis > > like you have been doing, but then you lose some of the automation that > > SolrCloud provides, so I do not recommend handling it that way. > > > > > Regarding one Solr context per jetty; I cannot run two solr contexts > > > pointing to different data in Jetty, as while starting jetty I have to > > > provide -Dsolr.solr.home variable - which ends up pointing to one data > > > folder (2014 data) only. > > > > You do not need multiple contexts to have multiple indexes. > > > > My dev Solr server has exactly one Solr JVM, with exactly one context -- > > /solr. That instance of Solr has 45 indexes (cores) on it. These 45 > > cores are various shards for three larger indexes. I am not running > > SolrCloud, but I certainly could. > > > > You can see 25 of the 45 cores in my Solr instance in this screenshot of > > the admin UI for this server: > > > > https://www.dropbox.com/s/v87mxvkdejvd92h/solr-with-45-cores.png?dl=0 > > > > There IS a way to specify the solr home for a specific context, but keep > > in mind that I definitely DO NOT recommend doing this. There is > > resource and administrative overhead to running multiple copies of Solr > > in one JVM. Simply run one context and let it handle multiple shards, > > whether you choose SolrCloud or not. > > > > Thanks, > > Shawn > > > > >
Re: Solr 4.10 with Jetty 8.1.10 & Tomcat 7
Susheel, thank you for asking. I am using joins of two cores (employee, department, servicetickets), which isn't support by SolrCloud - last time I check. Not sure if this (advanced distributed request option) was present in 4.10. Do you think, I am missing something here? Shahzad On Tue, Feb 9, 2016 at 6:39 PM, Susheel Kumar wrote: > Shahzad - I am curious what features of distributed search stops you to run > SolrCloud. Using DS, you would be able to search across cores or > collections. > > https://cwiki.apache.org/confluence/display/solr/Advanced+Distributed+Request+Options > > Thanks, > Susheel > > On Tue, Feb 9, 2016 at 12:10 AM, Shahzad Masud < > shahzad.ma...@northbaysolutions.net> wrote: > > > Thank you Shawn for your response. I would be running some performance > > tests lately on this structure (one JVM with multiple cores), and would > > share feedback on this thread. > > > > >There IS a way to specify the solr home for a specific context, but keep > > >in mind that I definitely DO NOT recommend doing this. There is > > >resource and administrative overhead to running multiple copies of Solr > > >in one JVM. Simply run one context and let it handle multiple shards, > > >whether you choose SolrCloud or not. > > Due to distributed search feature, I might not be able to run SolrCloud. > I > > would appreciate, if you please share that way of setting solr home for a > > specific context in Jetty-Solr. Its good to seek more information for > > comparison purposes. Do you think having multiple JVMs would increase or > > decrease performance. My document base is around 20 million rows (in 24 > > shards), with document size ranging from 100KB - 400 MB. > > > > SM > > > > On Mon, Feb 8, 2016 at 8:09 PM, Shawn Heisey > wrote: > > > > > On 2/8/2016 1:14 AM, Shahzad Masud wrote: > > > > Thank you Shawn for your reply. Here is my structure of cores and > > shards > > > > > > > > Shard 1 = localhost:8983/solr_2014 [3 Core - Employee, Service > > Tickets, > > > > Departments] > > > > Shard 2 = localhost:8983/solr_2015 [3 Core - Employee, Service > > Tickets, > > > > Departments] > > > > Shard 3 = localhost:8983/solr_2016 [3 Core - Employee, Service > > Tickets, > > > > Departments] > > > > > > > > While searching, I use distributed search feature to search data from > > all > > > > three shards in respective cores e.g. If I want to search from > Employee > > > > data for all three years, I search from Employee core of three > > contexts. > > > > This is legacy design, do you think this is okay, or this require > > > immediate > > > > restructure / design? I am going to try this, > > > > > > > > Context = localhost:8982/solr (9 cores - Employee-2014, > Employee-2015, > > > > Employee-2016, ServiceTickets-2014, ServiceTickets-2015, > > > > ServiceTickets-2016, Department-2014, Department-2015, > Department-2016] > > > > distributed search would be from all three cores of same data > category > > > > (i.e. For Employee search, it would be from Employee-2014, > > Employee-2015, > > > > Employee-2016). > > > > > > With SolrCloud, you can have multiple collections for each of these > > > types and alias them together. Or you can simply have one collection > > > for employee, one for servicetickets, and one for department, with > > > SolrCloud automatically handling splitting those documents into the > > > number of shardsthat you specify when you create the collection. You > > > can also do manual sharding and split each collection on a time basis > > > like you have been doing, but then you lose some of the automation that > > > SolrCloud provides, so I do not recommend handling it that way. > > > > > > > Regarding one Solr context per jetty; I cannot run two solr contexts > > > > pointing to different data in Jetty, as while starting jetty I have > to > > > > provide -Dsolr.solr.home variable - which ends up pointing to one > data > > > > folder (2014 data) only. > > > > > > You do not need multiple contexts to have multiple indexes. > > > > > > My dev Solr server has exactly one Solr JVM, with exactly one context > -- > > > /solr. That instance of Solr has 45 indexes (cores) on it. These 45 > > > cores are various shards for three larger indexes. I am not running > > > SolrCloud, but I certainly could. > > > > > > You can see 25 of the 45 cores in my Solr instance in this screenshot > of > > > the admin UI for this server: > > > > > > https://www.dropbox.com/s/v87mxvkdejvd92h/solr-with-45-cores.png?dl=0 > > > > > > There IS a way to specify the solr home for a specific context, but > keep > > > in mind that I definitely DO NOT recommend doing this. There is > > > resource and administrative overhead to running multiple copies of Solr > > > in one JVM. Simply run one context and let it handle multiple shards, > > > whether you choose SolrCloud or not. > > > > > > Thanks, > > > Shawn > > > > > > > > >
Re: Solr 4.10 with Jetty 8.1.10 & Tomcat 7
On 2/8/2016 10:10 PM, Shahzad Masud wrote: > Due to distributed search feature, I might not be able to run > SolrCloud. I would appreciate, if you please share that way of setting > solr home for a specific context in Jetty-Solr. Its good to seek more > information for comparison purposes. Do you think having multiple JVMs > would increase or decrease performance. My document base is around 20 > million rows (in 24 shards), with document size ranging from 100KB - > 400 MB. SM For most people, the *entire point* of running SolrCloud is to do distributed search, so to hear that you can't run SolrCloud because of distributed search is very confusing to me. I admit to ignorance when it comes to the join feature in Solr ... but it is my understanding that all you need to make joins work properly is to have both of the indexes that you are joining running in the same JVM and the same Solr instance. If you arrange your SolrCloud replicas so a copy of every index is loaded on every server, I think that would satisfy this requirement. I may be wrong, but I believe there are SolrCloud users that use the join feature. When you create a config file for a Solr context, whether it's Jetty, Tomcat, or some other container, you can set the solr/home JNDI variable in the context fragment to set the solr home for that context. I found a specific example for Tomcat. I know Jetty can do the same, but I do not know how to actually create the context fragment. https://wiki.apache.org/solr/SolrTomcat#Installing_Solr_instances_under_Tomcat I need to reiterate one point again. You should only run one Solr container per server, with exactly one Solr context installed in that server. This is recommended whether you're running SolrCloud or not, and whether you're using distributed search or not. One Solr context can handle a LOT of indexes. Running multiple Solr instances per server is only recommended in one case: Extremely large indexes where you would need a very large heap. Running two JVMs with smaller heaps *might* be more efficient ... but in that case, it is usually better to split those indexes between two separate servers, each one running only one instance of Solr. Thanks, Shawn
Re: Custom JSON facet functions
On Tue, Feb 9, 2016 at 7:10 AM, Markus Jelsma wrote: > Hi - i must have missing something but is it possible to declare custom JSON > facet functions in solrconfig.xml? Just like we would do with request > handlers or search components? Yes, but it will probably change: https://issues.apache.org/jira/browse/SOLR-7447 So currently, you would register a facet function just like a custom function (value source), but just put "_agg" at the end of the name and implement AggValueSource. So for example, the "sum" facet function is registered as "sum_agg" and implements AggValueSource (the class is SumAgg) So if you utilize this, just realize that the mechanism and interfaces are experimental and subject to change (and probably will change at some point for this in particular). -Yonik
Re: /solr/admin/ping causing exceptions in log?
Nathan, Did you ever get to the bottom of this issue? I'm encountering exactly the same problem with haproxy 1.6.2; health checks throwing occasional errors and the connection being closed by haproxy. Daniel Pool This electronic message contains information from CACI International Inc or subsidiary companies, which may be confidential, proprietary, privileged or otherwise protected from disclosure. The information is intended to be used solely by the recipient(s) named above. If you are not an intended recipient, be aware that any review, disclosure, copying, distribution or use of this transmission or its contents is prohibited. If you have received this transmission in error, please notify us immediately at postmas...@caci.co.uk Viruses: Although we have taken steps to ensure that this e-mail and attachments are free from any virus, we advise that in keeping with good computing practice the recipient should ensure they are actually virus free. CACI Limited. Registered in England & Wales. Registration No. 1649776. CACI House, Avonmore Road, London, W14 8TS.
RE: Custom JSON facet functions
Nice! Are the aggregations also going to be pluggable? Reading the ticket, i would assume it is going to be pluggable. Thanks, Markus -Original message- > From:Yonik Seeley > Sent: Tuesday 9th February 2016 15:25 > To: solr-user@lucene.apache.org > Subject: Re: Custom JSON facet functions > > On Tue, Feb 9, 2016 at 7:10 AM, Markus Jelsma > wrote: > > Hi - i must have missing something but is it possible to declare custom > > JSON facet functions in solrconfig.xml? Just like we would do with request > > handlers or search components? > > Yes, but it will probably change: > https://issues.apache.org/jira/browse/SOLR-7447 > > So currently, you would register a facet function just like a custom > function (value source), > but just put "_agg" at the end of the name and implement AggValueSource. > So for example, the "sum" facet function is registered as "sum_agg" > and implements AggValueSource (the class is SumAgg) > > So if you utilize this, just realize that the mechanism and interfaces > are experimental and subject to change (and probably will change at > some point for this in particular). > > -Yonik >
Re: Custom JSON facet functions
On Tue, Feb 9, 2016 at 10:02 AM, Markus Jelsma wrote: > Nice! Are the aggregations also going to be pluggable? Reading the ticket, i > would assume it is going to be pluggable. Yep. -Yonik > Thanks, > Markus > > -Original message- >> From:Yonik Seeley >> Sent: Tuesday 9th February 2016 15:25 >> To: solr-user@lucene.apache.org >> Subject: Re: Custom JSON facet functions >> >> On Tue, Feb 9, 2016 at 7:10 AM, Markus Jelsma >> wrote: >> > Hi - i must have missing something but is it possible to declare custom >> > JSON facet functions in solrconfig.xml? Just like we would do with request >> > handlers or search components? >> >> Yes, but it will probably change: >> https://issues.apache.org/jira/browse/SOLR-7447 >> >> So currently, you would register a facet function just like a custom >> function (value source), >> but just put "_agg" at the end of the name and implement AggValueSource. >> So for example, the "sum" facet function is registered as "sum_agg" >> and implements AggValueSource (the class is SumAgg) >> >> So if you utilize this, just realize that the mechanism and interfaces >> are experimental and subject to change (and probably will change at >> some point for this in particular). >> >> -Yonik >>
Re: SolrCloud behavior when a ZooKeeper node goes down
On 2/8/2016 1:09 PM, Kelly, Frank wrote: > We are running a small SolrCloud instance on AWS > > Solr : Version 5.3.1 > ZooKeeper: Version 3.4.6 > > 3 x ZooKeeper nodes (with higher limits and timeouts due to being on AWS) > 3 x Solr Nodes (8 GB of memory each – 2 collections with 3 shards for > each collection) > > Let’s call the ZooKeeper nodes A, B and C. > One of our ZooKeeper nodes (B) failed a health check and was replaced > due to autoscaling , but during this time of failover > our SolrCloud cluster became unavailable. All new connections to Solr > were unable to connect complaining about connectivity issues > and preexisting connections also had errors > > I thought because we had configured SolrCloud to point at all three ZK > nodes that the failure of one ZK node would be OK (since we still had > a quorum). > Did I misunderstand something about SolrCloud and its relationship > with ZK? That's supposed to be how Zookeeper and SolrCloud work, if everything is configured properly and has full network connectivity. What is your zkHost string for Solr? Is the zkHost value the same on all three SolrCloud nodes? It should be identical on all of them, and every server should be able to directly reach every other server on all relevant ports. > The weird thing now is that when the new ZooKeeper node (D) started up > – after a few minutes we could connect to SolrCloud again even though > we were still only pointing to A,B and C (not D). > Any thoughts on why this also happened? This sounds odd. The exceptions that you outlined are from *client* code (CloudSolrClient), not the Solr servers. CloudSolrClient instances should normally be constructed using the same zkHost string that your Solr servers use, listing all of the zookeeper servers. Is this how they are set up? I am unsure how all this might be affected by the internal/external addressing that AWS uses. Thanks, Shawn
Re: Solr architecture
Hi, Thanks for all your suggestions. I took some time to get the details to be more accurate. Please find what I have gathered:- My data being indexed is something like this. I am basically capturing all data related to a user session. Inside a session I have categorized my actions like actionA, actionB etc.., per page. So each time an action pertaining to say actionA or actionB etc.. (in each page) happens, it is updated in Solr under that session (sessionId). So in short there is only one doc pertaining to a single session (identified by sessionid) in my Solr index and that is retrieved and updated whenever a new action under that session occurs. We expect upto 4Million session per day. On an average *one session's* *doc has a size* of *3MB to 20MB*. So if it is *4Million sessions per day*, each session writing around *500 times to Solr*, it is* 2Billion writes or (indexing) per day to Solr*. As it is one doc per session, it is *4Million docs per day*. This is around *80K docs indexed per second* during *peak* hours and around *15K docs indexed per second* into Solr during* non-peak* hours. Number of queries per second is around *320 queries per second*. 1. Average size of a doc 3MB to 20MB 2. Query types:- Until that session is in progress, whatever data is there for that session so far is queried and the new action's details captured and appended to existing data already capturedrelated to that session and indexed back into Solr. So, longer the session the more data retrieved for each subsequent query to get current data captured for that session. Also querying can be done on timestamp etc... which is captured along with each action. 3. Are docs grouped somehow? All data related to a session are retrieved from Solr, updated and indexed back to Solr based on sessionId. No other grouping. 4. Are they time sensitive (NRT or offline process does this) As mentioned above this is in NRT. Each time a new user action in that session happens, we need to query existing session info already captured related to that session andappend this new data to this existing info retrieved and index it back to Solr. 5. Will they update or it is rebuild every time, etc. Each time a new user action occurs, the full data pertaining to that session so far captured is retrieved from Solr, the extra latest data pertaining to this new action is appended and indexed back to Solr. 6. And the other thing you haven't told us is whether you plan on _adding_ 2B docs a day or whether that number is the total corpus size and you are re-indexing the 2B docs/day. IOW, if you are adding 2B docs/day, 30 days later do you have 2B docs or 60B docs in your corpus? We are expecting around 4 million sessions per day (per session 500 writes to Solr), which turns out to be 2B indexing done per day. So after 30 days it would be 4Milion*30 docs in the index. 7. Is there any aging of docs No we always query against the whole corpus present. 8. Is any doc deleted? No all data remains in the index Any suggestion is very welcome! Thanks! Mark. On Mon, Feb 8, 2016 at 3:30 PM, Jack Krupansky wrote: > Oops... at 100 qps for a single node you would need 120 nodes to get to 12K > qps and 800 nodes to get 80K qps, but that is just an extremely rough > ballpark estimate, not some precise and firm number. And that's if all the > queries can be evenly distributed throughout the cluster and don't require > fanout to other shards, which effectively turns each incoming query into n > queries where n is the number of shards. > > -- Jack Krupansky > > On Mon, Feb 8, 2016 at 12:07 PM, Jack Krupansky > wrote: > > > So is there any aging or TTL (in database terminology) of older docs? > > > > And do all of your queries need to query all of the older documents all > of > > the time or is there a clear hierarchy of querying for aged documents, > like > > past 24-hours vs. past week vs. past year vs. older than a year? Sure, > you > > can always use a function query to boost by the inverse of document age, > > but Solr would be more efficient with filter queries or separate indexes > > for different time scales. > > > > Are documents ever updated or are they write-once? > > > > Are documents explicitly deleted? > > > > Technically you probably could meet those specs, but... how many > > organizations have the resources and the energy to do so? > > > > As a back of the envelope calculation, if Solr gave you 100 queries per > > second per node, that would mean you would need 1,200 nodes. It would > also > > depend on whether those queries are very narrow so that a single node can > > execute them or if they require fanout to other shards and then > aggregation > > of results from those other shards. > > > > -- Jack Krupansky > > > > On Mon, Feb 8, 2016 at 11:24 AM, Erick Erickson > > > wrote: > > > >> Short form: You really have to prototype. Here's the long form: > >> > >> > >> > https:
Re: /solr/admin/ping causing exceptions in log?
On 2/9/2016 7:01 AM, Daniel Pool wrote: > Did you ever get to the bottom of this issue? I'm encountering exactly the > same problem with haproxy 1.6.2; health checks throwing occasional errors and > the connection being closed by haproxy. Your message did not include any quotes from the original thread, or mention the specific problem you are seeing. The thread that you replied to is nearly a year and a half old, so I had already archived it to a "2014" folder. If you are seeing "EofException" in your logs like the original poster was, then this is happening because Solr is taking longer to respond to the query than whichever timeout in haproxy is active for that request, so haproxy closed the TCP connection, resulting in that error in the Solr log. The underlying issue is likely a performance problem. If you are seeing a different problem than EofException, please give us full details. Here is all the generic info I have on performance problems with Solr: https://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn
Knowing which doc failed to get added in solr during bulk addition in Solr 5.2
Hi, I have a Document Centric Versioning Constraints added in solr schema:- false doc_version I am adding multiple documents in solr in a single call using SolrJ 5.2. The code fragment looks something like below :- try { UpdateResponse resp = solrClient.add(docs.getDocCollection(), 500); if (resp.getStatus() != 0) { throw new Exception(new StringBuilder( "Failed to add docs in solr ").append(resp.toString()) .toString()); } } catch (Exception e) { logError("Adding docs to solr failed", e); } If one of the document is violating the versioning constraints then Solr is returning an exception with error message like "user version is not high enough: 1454587156" & the other documents are getting added perfectly. Is there a way I can know which document is violating the constraints either in Solr logs or from the Update response returned by Solr? Thanks
Re: Knowing which doc failed to get added in solr during bulk addition in Solr 5.2
This has been a long standing issue, Hoss is doing some current work on it see: https://issues.apache.org/jira/browse/SOLR-445 But the short form is "no, not yet". Best, Erick On Tue, Feb 9, 2016 at 8:19 AM, Debraj Manna wrote: > Hi, > > > > I have a Document Centric Versioning Constraints added in solr schema:- > > > false > doc_version > > > I am adding multiple documents in solr in a single call using SolrJ 5.2. > The code fragment looks something like below :- > > > try { > UpdateResponse resp = solrClient.add(docs.getDocCollection(), > 500); > if (resp.getStatus() != 0) { > throw new Exception(new StringBuilder( > "Failed to add docs in solr ").append(resp.toString()) > .toString()); > } > } catch (Exception e) { > logError("Adding docs to solr failed", e); > } > > > If one of the document is violating the versioning constraints then Solr is > returning an exception with error message like "user version is not high > enough: 1454587156" & the other documents are getting added perfectly. Is > there a way I can know which document is violating the constraints either > in Solr logs or from the Update response returned by Solr? > > Thanks
Re: Solr architecture
Bear in mind that Lucene is optimised towards high read lower write. That is, it puts in a lot of effort at write time to make reading efficient. It sounds like you are going to be doing far more writing than reading, and I wonder whether you are necessarily choosing the right tool for the job. How would you later use this data, and what advantage is there to storing it in Solr? Upayavira On Tue, Feb 9, 2016, at 03:40 PM, Mark Robinson wrote: > Hi, > Thanks for all your suggestions. I took some time to get the details to > be > more accurate. Please find what I have gathered:- > > My data being indexed is something like this. > I am basically capturing all data related to a user session. > Inside a session I have categorized my actions like actionA, actionB > etc.., > per page. > So each time an action pertaining to say actionA or actionB etc.. (in > each > page) happens, it is updated in Solr under that session (sessionId). > > So in short there is only one doc pertaining to a single session > (identified by sessionid) in my Solr index and that is retrieved and > updated > whenever a new action under that session occurs. > We expect upto 4Million session per day. > > On an average *one session's* *doc has a size* of *3MB to 20MB*. > So if it is *4Million sessions per day*, each session writing around *500 > times to Solr*, it is* 2Billion writes or (indexing) per day to Solr*. > As it is one doc per session, it is *4Million docs per day*. > This is around *80K docs indexed per second* during *peak* hours and > around *15K > docs indexed per second* into Solr during* non-peak* hours. > Number of queries per second is around *320 queries per second*. > > > 1. Average size of a doc > 3MB to 20MB > 2. Query types:- > Until that session is in progress, whatever data is there for that > session so far is queried and the new action's details captured and > appended to existing data already capturedrelated to that session > and indexed back into Solr. So, longer the session the more data > retrieved > for each subsequent query to get current data captured for that session. > Also querying can be done on timestamp etc... which is captured > along > with each action. > 3. Are docs grouped somehow? > All data related to a session are retrieved from Solr, updated and > indexed back to Solr based on sessionId. No other grouping. > 4. Are they time sensitive (NRT or offline process does this) > As mentioned above this is in NRT. Each time a new user action in > that > session happens, we need to query existing session info already captured > related to that session andappend this new data to this existing > info retrieved and index it back to Solr. > 5. Will they update or it is rebuild every time, etc. > Each time a new user action occurs, the full data pertaining to that > session so far captured is retrieved from Solr, the extra latest data > pertaining to this new action is appended and indexed back to Solr. > 6. And the other thing you haven't told us is whether you plan on > _adding_ > 2B docs a day or whether that number is the total corpus size and you are > re-indexing the 2B docs/day. IOW, if you are adding 2B docs/day, 30 days > later do you have 2B docs or 60B docs in your >corpus? >We are expecting around 4 million sessions per day (per session 500 > writes to Solr), which turns out to be 2B indexing done per day. So after > 30 days it would be 4Milion*30 docs in the index. > 7. Is there any aging of docs > No we always query against the whole corpus present. > 8. Is any doc deleted? > No all data remains in the index > > Any suggestion is very welcome! > > Thanks! > Mark. > > > On Mon, Feb 8, 2016 at 3:30 PM, Jack Krupansky > wrote: > > > Oops... at 100 qps for a single node you would need 120 nodes to get to 12K > > qps and 800 nodes to get 80K qps, but that is just an extremely rough > > ballpark estimate, not some precise and firm number. And that's if all the > > queries can be evenly distributed throughout the cluster and don't require > > fanout to other shards, which effectively turns each incoming query into n > > queries where n is the number of shards. > > > > -- Jack Krupansky > > > > On Mon, Feb 8, 2016 at 12:07 PM, Jack Krupansky > > wrote: > > > > > So is there any aging or TTL (in database terminology) of older docs? > > > > > > And do all of your queries need to query all of the older documents all > > of > > > the time or is there a clear hierarchy of querying for aged documents, > > like > > > past 24-hours vs. past week vs. past year vs. older than a year? Sure, > > you > > > can always use a function query to boost by the inverse of document age, > > > but Solr would be more efficient with filter queries or separate indexes > > > for different time scales. > > > > > > Are documents ever updated or are they write-once? > > > > > > Are documents explicitly deleted? > > > > > > Techni
Re: Solr 4.10 with Jetty 8.1.10 & Tomcat 7
Shahzad - As Shawn mentioned you can get lot of inputs from the folks who are using joins in Solr cloud if you start a new thread and i would suggest to take a look at Solr Streaming expressions and Parallel SQL Interface which covers joining use cases as well. Thanks, Susheel On Tue, Feb 9, 2016 at 9:17 AM, Shawn Heisey wrote: > On 2/8/2016 10:10 PM, Shahzad Masud wrote: > > Due to distributed search feature, I might not be able to run > > SolrCloud. I would appreciate, if you please share that way of setting > > solr home for a specific context in Jetty-Solr. Its good to seek more > > information for comparison purposes. Do you think having multiple JVMs > > would increase or decrease performance. My document base is around 20 > > million rows (in 24 shards), with document size ranging from 100KB - > > 400 MB. SM > > For most people, the *entire point* of running SolrCloud is to do > distributed search, so to hear that you can't run SolrCloud because of > distributed search is very confusing to me. > > I admit to ignorance when it comes to the join feature in Solr ... but > it is my understanding that all you need to make joins work properly is > to have both of the indexes that you are joining running in the same JVM > and the same Solr instance. If you arrange your SolrCloud replicas so a > copy of every index is loaded on every server, I think that would > satisfy this requirement. I may be wrong, but I believe there are > SolrCloud users that use the join feature. > > When you create a config file for a Solr context, whether it's Jetty, > Tomcat, or some other container, you can set the solr/home JNDI variable > in the context fragment to set the solr home for that context. I found > a specific example for Tomcat. I know Jetty can do the same, but I do > not know how to actually create the context fragment. > > > https://wiki.apache.org/solr/SolrTomcat#Installing_Solr_instances_under_Tomcat > > I need to reiterate one point again. You should only run one Solr > container per server, with exactly one Solr context installed in that > server. This is recommended whether you're running SolrCloud or not, > and whether you're using distributed search or not. One Solr context > can handle a LOT of indexes. > > Running multiple Solr instances per server is only recommended in one > case: Extremely large indexes where you would need a very large heap. > Running two JVMs with smaller heaps *might* be more efficient ... but in > that case, it is usually better to split those indexes between two > separate servers, each one running only one instance of Solr. > > Thanks, > Shawn > >
replicate indexing to second site
I have a Solr Cloud cluster (v5.2.1) using a Zookeeper ensemble in my primary data center. I am now trying to plan for disaster recovery with an available warm site. I have read (many times) the disaster recovery section in the Apache ref guide. I suppose I don't fully understand it. What I'd like to know is the best way to sync up the existing data, and the best way to keep that data in sync. Assume that the warm site is an exact copy (not at the network level) of the production cluster - so the same servers with the same config. All servers are virtual. The use case is the active cluster goes down and cannot be repaired, so the warm site would become the active site. This is a manual process that takes many hours to accomplish (I just need to fit Solr into this existing process, I can't change the process :). I expect that rsync can be used initially to copy the collection data folders and the zookeeper data and transaction log folders. So after verifying Solr/ZK is functional after the install, shut it down and perform the copy. This may sound slow but my production index size is < 100GB. Is this approach reasonable? So now to keep the warm site in sync, I could use rsync on a scheduled basis but I assume there's a better way. The ref guide says to send all indexing requests to the second cluster at the same time they are sent to the active cluster. I use SolrJ for all requests. So would this entail using a second CloudSolrClient instance that only knows about the second cluster? Seems reasonable but I don't want to lengthen the response time for the users. Is this just a software problem to work out (separate thread)? Or is there a SolrJ solution (asyc calls)? Thanks!! -- View this message in context: http://lucene.472066.n3.nabble.com/replicate-indexing-to-second-site-tp4256240.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: replicate indexing to second site
There is a Cross Datacenter replication feature in the works - not sure of its status. In lieu of that, I'd simply have two copies of your indexing code - index everything simultaneously into both clusters. There is, of course risks that both get out of sync, so you might want to find some ways to identify/manage that. Upayavira On Tue, Feb 9, 2016, at 08:43 PM, tedsolr wrote: > I have a Solr Cloud cluster (v5.2.1) using a Zookeeper ensemble in my > primary > data center. I am now trying to plan for disaster recovery with an > available > warm site. I have read (many times) the disaster recovery section in the > Apache ref guide. I suppose I don't fully understand it. > > What I'd like to know is the best way to sync up the existing data, and > the > best way to keep that data in sync. Assume that the warm site is an exact > copy (not at the network level) of the production cluster - so the same > servers with the same config. All servers are virtual. The use case is > the > active cluster goes down and cannot be repaired, so the warm site would > become the active site. This is a manual process that takes many hours to > accomplish (I just need to fit Solr into this existing process, I can't > change the process :). > > I expect that rsync can be used initially to copy the collection data > folders and the zookeeper data and transaction log folders. So after > verifying Solr/ZK is functional after the install, shut it down and > perform > the copy. This may sound slow but my production index size is < 100GB. Is > this approach reasonable? > > So now to keep the warm site in sync, I could use rsync on a scheduled > basis > but I assume there's a better way. The ref guide says to send all > indexing > requests to the second cluster at the same time they are sent to the > active > cluster. I use SolrJ for all requests. So would this entail using a > second > CloudSolrClient instance that only knows about the second cluster? Seems > reasonable but I don't want to lengthen the response time for the users. > Is > this just a software problem to work out (separate thread)? Or is there a > SolrJ solution (asyc calls)? > > Thanks!! > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/replicate-indexing-to-second-site-tp4256240.html > Sent from the Solr - User mailing list archive at Nabble.com.
How is Tika used with Solr
Hi folks, I'm writing a file-system-crawler that will index files. The file system is going to be very busy an I anticipate on average 10 new updates per min. My application checks for new or updated files once every 1 min. I use Tika to extract the raw-text off those files and send them over to Solr for indexing. My application will be running 24x7xN-days. It will not recycle unless if the OS is restarted. Over at Tika mailing list, I was told the following: "As a side note, if you are handling a bunch of files from the wild in a production environment, I encourage separating Tika into a separate jvm vs tying it into any post processing – consider tika-batch and writing separate text files for each file processed (not so efficient, but exceedingly robust). If this is demo code or you know your document set well enough, you should be good to go with keeping Tika and your postprocessing steps in the same jvm." My question is, how does Solr utilize Tika? Does it run Tika in its own JVM as an out-of-process application or does it link with Tika JARs directly? If it links in directly, are there known issues with Solr integrated with Tika because of Tika issues? Thanks Steve
Re: How is Tika used with Solr
Solr uses Tika directly. And not in the most efficient way. It is there mostly for convenience rather than performance. So, for performance, Solr recommendation is also to run Tika separately and only send Solr the processed documents. Regards, Alex. Newsletter and resources for Solr beginners and intermediates: http://www.solr-start.com/ On 10 February 2016 at 09:46, Steven White wrote: > Hi folks, > > I'm writing a file-system-crawler that will index files. The file system > is going to be very busy an I anticipate on average 10 new updates per > min. My application checks for new or updated files once every 1 min. I > use Tika to extract the raw-text off those files and send them over to Solr > for indexing. My application will be running 24x7xN-days. It will not > recycle unless if the OS is restarted. > > Over at Tika mailing list, I was told the following: > > "As a side note, if you are handling a bunch of files from the wild in a > production environment, I encourage separating Tika into a separate jvm vs > tying it into any post processing – consider tika-batch and writing > separate text files for each file processed (not so efficient, but > exceedingly robust). If this is demo code or you know your document set > well enough, you should be good to go with keeping Tika and your > postprocessing steps in the same jvm." > > My question is, how does Solr utilize Tika? Does it run Tika in its own > JVM as an out-of-process application or does it link with Tika JARs > directly? If it links in directly, are there known issues with Solr > integrated with Tika because of Tika issues? > > Thanks > > Steve
Re: How is Tika used with Solr
Here's a writeup that should help https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/ On Tue, Feb 9, 2016 at 2:49 PM, Alexandre Rafalovitch wrote: > Solr uses Tika directly. And not in the most efficient way. It is > there mostly for convenience rather than performance. > > So, for performance, Solr recommendation is also to run Tika > separately and only send Solr the processed documents. > > Regards, > Alex. > > Newsletter and resources for Solr beginners and intermediates: > http://www.solr-start.com/ > > > On 10 February 2016 at 09:46, Steven White wrote: >> Hi folks, >> >> I'm writing a file-system-crawler that will index files. The file system >> is going to be very busy an I anticipate on average 10 new updates per >> min. My application checks for new or updated files once every 1 min. I >> use Tika to extract the raw-text off those files and send them over to Solr >> for indexing. My application will be running 24x7xN-days. It will not >> recycle unless if the OS is restarted. >> >> Over at Tika mailing list, I was told the following: >> >> "As a side note, if you are handling a bunch of files from the wild in a >> production environment, I encourage separating Tika into a separate jvm vs >> tying it into any post processing – consider tika-batch and writing >> separate text files for each file processed (not so efficient, but >> exceedingly robust). If this is demo code or you know your document set >> well enough, you should be good to go with keeping Tika and your >> postprocessing steps in the same jvm." >> >> My question is, how does Solr utilize Tika? Does it run Tika in its own >> JVM as an out-of-process application or does it link with Tika JARs >> directly? If it links in directly, are there known issues with Solr >> integrated with Tika because of Tika issues? >> >> Thanks >> >> Steve
Re: Solr architecture
So as I understand your use case, its effectively logging actions within a user session, why do you have to do the update in NRT? Why not just log all the user session events (with some unique key, and ensuring the session Id is in the document somewhere), then when you want to do the query, you join on the session id, and that gives you all the data records for that session. I don't really follow why it has to be 1 document (which you continually update). If you really need that aggregation, couldn't that happen offline? I guess your 1 saving grace is that you query using the unique ID (in your scenario) so you could use the real-time get handler, since you aren't doing a complex query (strictly its not a search, its a raw key lookup). But I would still question your use case, if you go the Solr route for that kind of scale with querying and indexing that much, you're going to have to throw a lot of hardware at it, as Jack says probably in the order of hundreds of machines... On 9 February 2016 at 19:00, Upayavira wrote: > Bear in mind that Lucene is optimised towards high read lower write. > That is, it puts in a lot of effort at write time to make reading > efficient. It sounds like you are going to be doing far more writing > than reading, and I wonder whether you are necessarily choosing the > right tool for the job. > > How would you later use this data, and what advantage is there to > storing it in Solr? > > Upayavira > > On Tue, Feb 9, 2016, at 03:40 PM, Mark Robinson wrote: > > Hi, > > Thanks for all your suggestions. I took some time to get the details to > > be > > more accurate. Please find what I have gathered:- > > > > My data being indexed is something like this. > > I am basically capturing all data related to a user session. > > Inside a session I have categorized my actions like actionA, actionB > > etc.., > > per page. > > So each time an action pertaining to say actionA or actionB etc.. (in > > each > > page) happens, it is updated in Solr under that session (sessionId). > > > > So in short there is only one doc pertaining to a single session > > (identified by sessionid) in my Solr index and that is retrieved and > > updated > > whenever a new action under that session occurs. > > We expect upto 4Million session per day. > > > > On an average *one session's* *doc has a size* of *3MB to 20MB*. > > So if it is *4Million sessions per day*, each session writing around *500 > > times to Solr*, it is* 2Billion writes or (indexing) per day to Solr*. > > As it is one doc per session, it is *4Million docs per day*. > > This is around *80K docs indexed per second* during *peak* hours and > > around *15K > > docs indexed per second* into Solr during* non-peak* hours. > > Number of queries per second is around *320 queries per second*. > > > > > > 1. Average size of a doc > > 3MB to 20MB > > 2. Query types:- > > Until that session is in progress, whatever data is there for that > > session so far is queried and the new action's details captured and > > appended to existing data already capturedrelated to that session > > and indexed back into Solr. So, longer the session the more data > > retrieved > > for each subsequent query to get current data captured for that session. > > Also querying can be done on timestamp etc... which is captured > > along > > with each action. > > 3. Are docs grouped somehow? > > All data related to a session are retrieved from Solr, updated and > > indexed back to Solr based on sessionId. No other grouping. > > 4. Are they time sensitive (NRT or offline process does this) > > As mentioned above this is in NRT. Each time a new user action in > > that > > session happens, we need to query existing session info already captured > > related to that session andappend this new data to this existing > > info retrieved and index it back to Solr. > > 5. Will they update or it is rebuild every time, etc. > > Each time a new user action occurs, the full data pertaining to that > > session so far captured is retrieved from Solr, the extra latest data > > pertaining to this new action is appended and indexed back to Solr. > > 6. And the other thing you haven't told us is whether you plan on > > _adding_ > > 2B docs a day or whether that number is the total corpus size and you are > > re-indexing the 2B docs/day. IOW, if you are adding 2B docs/day, 30 days > > later do you have 2B docs or 60B docs in your > >corpus? > >We are expecting around 4 million sessions per day (per session 500 > > writes to Solr), which turns out to be 2B indexing done per day. So after > > 30 days it would be 4Milion*30 docs in the index. > > 7. Is there any aging of docs > > No we always query against the whole corpus present. > > 8. Is any doc deleted? > > No all data remains in the index > > > > Any suggestion is very welcome! > > > > Thanks! > > Mark. > > > > > > On Mon, Feb 8, 2016 at 3:30 PM, Jack Krupa
Re: replicate indexing to second site
Making two indexing calls, one to each, works until one system is not available. Then they are out of sync. You might want to put the updates into a persistent message queue, then have both systems indexed from that queue. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 9, 2016, at 1:49 PM, Upayavira wrote: > > There is a Cross Datacenter replication feature in the works - not sure > of its status. > > In lieu of that, I'd simply have two copies of your indexing code - > index everything simultaneously into both clusters. > > There is, of course risks that both get out of sync, so you might want > to find some ways to identify/manage that. > > Upayavira > > On Tue, Feb 9, 2016, at 08:43 PM, tedsolr wrote: >> I have a Solr Cloud cluster (v5.2.1) using a Zookeeper ensemble in my >> primary >> data center. I am now trying to plan for disaster recovery with an >> available >> warm site. I have read (many times) the disaster recovery section in the >> Apache ref guide. I suppose I don't fully understand it. >> >> What I'd like to know is the best way to sync up the existing data, and >> the >> best way to keep that data in sync. Assume that the warm site is an exact >> copy (not at the network level) of the production cluster - so the same >> servers with the same config. All servers are virtual. The use case is >> the >> active cluster goes down and cannot be repaired, so the warm site would >> become the active site. This is a manual process that takes many hours to >> accomplish (I just need to fit Solr into this existing process, I can't >> change the process :). >> >> I expect that rsync can be used initially to copy the collection data >> folders and the zookeeper data and transaction log folders. So after >> verifying Solr/ZK is functional after the install, shut it down and >> perform >> the copy. This may sound slow but my production index size is < 100GB. Is >> this approach reasonable? >> >> So now to keep the warm site in sync, I could use rsync on a scheduled >> basis >> but I assume there's a better way. The ref guide says to send all >> indexing >> requests to the second cluster at the same time they are sent to the >> active >> cluster. I use SolrJ for all requests. So would this entail using a >> second >> CloudSolrClient instance that only knows about the second cluster? Seems >> reasonable but I don't want to lengthen the response time for the users. >> Is >> this just a software problem to work out (separate thread)? Or is there a >> SolrJ solution (asyc calls)? >> >> Thanks!! >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/replicate-indexing-to-second-site-tp4256240.html >> Sent from the Solr - User mailing list archive at Nabble.com.
Re: How is Tika used with Solr
Thank you Erick and Alex. My main question is with a long running process using Tika in the same JVM as my application. I'm running my file-system-crawler in its own JVM (not Solr's). On Tika mailing list, it is suggested to run Tika's code in it's own JVM and invoke it from my file-system-crawler using Runtime.getRuntime().exec(). I fully understand from Alex suggestion and link provided by Erick to use Tika outside Solr. But what about using Tika within the same JVM as my file-system-crawler application or should I be making a system call to invoke another JAR, that runs in its own JVM to extract the raw text? Are there known issues with Tika when used in a long running process? Steve On Tue, Feb 9, 2016 at 5:53 PM, Erick Erickson wrote: > Here's a writeup that should help > > https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/ > > On Tue, Feb 9, 2016 at 2:49 PM, Alexandre Rafalovitch > wrote: > > Solr uses Tika directly. And not in the most efficient way. It is > > there mostly for convenience rather than performance. > > > > So, for performance, Solr recommendation is also to run Tika > > separately and only send Solr the processed documents. > > > > Regards, > > Alex. > > > > Newsletter and resources for Solr beginners and intermediates: > > http://www.solr-start.com/ > > > > > > On 10 February 2016 at 09:46, Steven White wrote: > >> Hi folks, > >> > >> I'm writing a file-system-crawler that will index files. The file > system > >> is going to be very busy an I anticipate on average 10 new updates per > >> min. My application checks for new or updated files once every 1 min. > I > >> use Tika to extract the raw-text off those files and send them over to > Solr > >> for indexing. My application will be running 24x7xN-days. It will not > >> recycle unless if the OS is restarted. > >> > >> Over at Tika mailing list, I was told the following: > >> > >> "As a side note, if you are handling a bunch of files from the wild in a > >> production environment, I encourage separating Tika into a separate jvm > vs > >> tying it into any post processing – consider tika-batch and writing > >> separate text files for each file processed (not so efficient, but > >> exceedingly robust). If this is demo code or you know your document set > >> well enough, you should be good to go with keeping Tika and your > >> postprocessing steps in the same jvm." > >> > >> My question is, how does Solr utilize Tika? Does it run Tika in its own > >> JVM as an out-of-process application or does it link with Tika JARs > >> directly? If it links in directly, are there known issues with Solr > >> integrated with Tika because of Tika issues? > >> > >> Thanks > >> > >> Steve >
Re: replicate indexing to second site
On 2/9/2016 1:43 PM, tedsolr wrote: > I expect that rsync can be used initially to copy the collection data > folders and the zookeeper data and transaction log folders. So after > verifying Solr/ZK is functional after the install, shut it down and perform > the copy. This may sound slow but my production index size is < 100GB. Is > this approach reasonable? > > So now to keep the warm site in sync, I could use rsync on a scheduled basis > but I assume there's a better way. The ref guide says to send all indexing > requests to the second cluster at the same time they are sent to the active > cluster. I use SolrJ for all requests. So would this entail using a second > CloudSolrClient instance that only knows about the second cluster? Seems > reasonable but I don't want to lengthen the response time for the users. Is > this just a software problem to work out (separate thread)? Or is there a > SolrJ solution (asyc calls)? The way I would personally handle keeping both systems in sync at the moment would be to modify my indexing system to update both systems in parallel. That likely would involve a second CloudSolrClient instance. There's a new feature called "Cross Data Center Replication" but as far as I know, it is only available in development versions, and has not been made available in any released version of Solr. http://yonik.com/solr-cross-data-center-replication/ This new feature may become available in 6.0 or a later 6.x release. I do not have any concrete information about the expected release date for 6.0. Thanks, Shawn
Re: replicate indexing to second site
Updating two systems in parallel gets into two-phase commit, instantly. So you need a persistent pool of updates that both clusters pull from. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 9, 2016, at 4:15 PM, Shawn Heisey wrote: > > On 2/9/2016 1:43 PM, tedsolr wrote: >> I expect that rsync can be used initially to copy the collection data >> folders and the zookeeper data and transaction log folders. So after >> verifying Solr/ZK is functional after the install, shut it down and perform >> the copy. This may sound slow but my production index size is < 100GB. Is >> this approach reasonable? >> >> So now to keep the warm site in sync, I could use rsync on a scheduled basis >> but I assume there's a better way. The ref guide says to send all indexing >> requests to the second cluster at the same time they are sent to the active >> cluster. I use SolrJ for all requests. So would this entail using a second >> CloudSolrClient instance that only knows about the second cluster? Seems >> reasonable but I don't want to lengthen the response time for the users. Is >> this just a software problem to work out (separate thread)? Or is there a >> SolrJ solution (asyc calls)? > > The way I would personally handle keeping both systems in sync at the > moment would be to modify my indexing system to update both systems in > parallel. That likely would involve a second CloudSolrClient instance. > > There's a new feature called "Cross Data Center Replication" but as far > as I know, it is only available in development versions, and has not > been made available in any released version of Solr. > > http://yonik.com/solr-cross-data-center-replication/ > > This new feature may become available in 6.0 or a later 6.x release. I > do not have any concrete information about the expected release date for > 6.0. > > Thanks, > Shawn >
Re: solr performance issue
1 million document isn't considered big for Solr. How much RAM does your machine have? Regards, Edwin On 8 February 2016 at 23:45, Susheel Kumar wrote: > 1 million document shouldn't have any issues at all. Something else is > wrong with your hw/system configuration. > > Thanks, > Susheel > > On Mon, Feb 8, 2016 at 6:45 AM, sara hajili wrote: > > > On Mon, Feb 8, 2016 at 3:04 AM, sara hajili > wrote: > > > > > sorry i made a mistake i have a bout 1000 K doc. > > > i mean about 100 doc. > > > > > > On Mon, Feb 8, 2016 at 1:35 AM, Emir Arnautovic < > > > emir.arnauto...@sematext.com> wrote: > > > > > >> Hi Sara, > > >> Not sure if I am reading this right, but I read it as you have 1000 > doc > > >> index and issues? Can you tell us bit more about your setup: number of > > >> servers, hw, index size, number of shards, queries that you run, do > you > > >> index at the same time... > > >> > > >> It seems to me that you are running Solr on server with limited RAM > and > > >> probably small heap. Swapping for sure will slow things down and GC is > > most > > >> likely reason for high CPU. > > >> > > >> You can use http://sematext.com/spm to collect Solr and host metrics > > and > > >> see where the issue is. > > >> > > >> Thanks, > > >> Emir > > >> > > >> -- > > >> Monitoring * Alerting * Anomaly Detection * Centralized Log Management > > >> Solr & Elasticsearch Support * http://sematext.com/ > > >> > > >> > > >> > > >> On 08.02.2016 10:27, sara hajili wrote: > > >> > > >>> hi all. > > >>> i have a problem with my solr performance and usage hardware like a > > >>> ram,cup... > > >>> i have a lot of document and so indexed file about 1000 doc in solr > > that > > >>> every doc has about 8 field in average. > > >>> and each field has about 60 char. > > >>> i set my field as a storedfield = "false" except of 1 field. // i > read > > >>> that this help performance. > > >>> i used copy field and dynamic field if it was necessary . // i read > > that > > >>> this help performance. > > >>> and now my question is that when i run a lot of query on solr i faced > > >>> with > > >>> a problem solr use more cpu and ram and after that filled ,it use a > lot > > >>> swapped storage and then use hard,but doesn't create a system file! > > >>> solr > > >>> fill hard until i forced to restart server to release hard disk. > > >>> and now my question is why solr treat in this way? and how i can > avoid > > >>> solr > > >>> to use huge cpu space? > > >>> any config need?! > > >>> > > >>> > > >> > > > > > >
Re: Solr architecture
Thanks for your replies and suggestions! Why I store all events related to a session under one doc? Each session can have about 500 total entries (events) corresponding to it. So when I try to retrieve a session's info it can back with around 500 records. If it is this compounded one doc per session, I can retrieve more sessions at a time with one doc per session. eg under a sessionId an array of eventA activities, eventB activities (using json). When an eventA activity again occurs, we will read all that data for that session, append this extra info to evenA data and push the whole session related data back (indexing) to Solr. Like this for many sessions parallely. Why NRT? Parallely many sessions are being written (4Million sessions hence 4Million docs per day). A person can do this querying any time. It is just a look up? Yes. We just need to retrieve all info for a session and pass it on to another system. We may even do some extra querying on some data like timestamps, pageurl etc in that info added to a session. Thinking of having the data separate from the actual Solr Instance and mention the loc of the dataDir in solrconfig. If Solr is not a good option could you please suggest something which will satisfy this use case with min response time while querying. Thanks! Mark On Tue, Feb 9, 2016 at 6:02 PM, Daniel Collins wrote: > So as I understand your use case, its effectively logging actions within a > user session, why do you have to do the update in NRT? Why not just log > all the user session events (with some unique key, and ensuring the session > Id is in the document somewhere), then when you want to do the query, you > join on the session id, and that gives you all the data records for that > session. I don't really follow why it has to be 1 document (which you > continually update). If you really need that aggregation, couldn't that > happen offline? > > I guess your 1 saving grace is that you query using the unique ID (in your > scenario) so you could use the real-time get handler, since you aren't > doing a complex query (strictly its not a search, its a raw key lookup). > > But I would still question your use case, if you go the Solr route for that > kind of scale with querying and indexing that much, you're going to have to > throw a lot of hardware at it, as Jack says probably in the order of > hundreds of machines... > > On 9 February 2016 at 19:00, Upayavira wrote: > > > Bear in mind that Lucene is optimised towards high read lower write. > > That is, it puts in a lot of effort at write time to make reading > > efficient. It sounds like you are going to be doing far more writing > > than reading, and I wonder whether you are necessarily choosing the > > right tool for the job. > > > > How would you later use this data, and what advantage is there to > > storing it in Solr? > > > > Upayavira > > > > On Tue, Feb 9, 2016, at 03:40 PM, Mark Robinson wrote: > > > Hi, > > > Thanks for all your suggestions. I took some time to get the details to > > > be > > > more accurate. Please find what I have gathered:- > > > > > > My data being indexed is something like this. > > > I am basically capturing all data related to a user session. > > > Inside a session I have categorized my actions like actionA, actionB > > > etc.., > > > per page. > > > So each time an action pertaining to say actionA or actionB etc.. (in > > > each > > > page) happens, it is updated in Solr under that session (sessionId). > > > > > > So in short there is only one doc pertaining to a single session > > > (identified by sessionid) in my Solr index and that is retrieved and > > > updated > > > whenever a new action under that session occurs. > > > We expect upto 4Million session per day. > > > > > > On an average *one session's* *doc has a size* of *3MB to 20MB*. > > > So if it is *4Million sessions per day*, each session writing around > *500 > > > times to Solr*, it is* 2Billion writes or (indexing) per day to Solr*. > > > As it is one doc per session, it is *4Million docs per day*. > > > This is around *80K docs indexed per second* during *peak* hours and > > > around *15K > > > docs indexed per second* into Solr during* non-peak* hours. > > > Number of queries per second is around *320 queries per second*. > > > > > > > > > 1. Average size of a doc > > > 3MB to 20MB > > > 2. Query types:- > > > Until that session is in progress, whatever data is there for that > > > session so far is queried and the new action's details captured and > > > appended to existing data already capturedrelated to that > session > > > and indexed back into Solr. So, longer the session the more data > > > retrieved > > > for each subsequent query to get current data captured for that > session. > > > Also querying can be done on timestamp etc... which is captured > > > along > > > with each action. > > > 3. Are docs grouped somehow? > > > All data related to a session are retrieved from Solr, update
RE: How is Tika used with Solr
I have one answer here [0], but I'd be interested to hear what Solr users/devs/integrators have experienced on this topic. [0] http://mail-archives.apache.org/mod_mbox/tika-user/201602.mbox/%3CCY1PR09MB0795EAED947B53965BC86874C7D70%40CY1PR09MB0795.namprd09.prod.outlook.com%3E -Original Message- From: Steven White [mailto:swhite4...@gmail.com] Sent: Tuesday, February 09, 2016 6:33 PM To: solr-user@lucene.apache.org Subject: Re: How is Tika used with Solr Thank you Erick and Alex. My main question is with a long running process using Tika in the same JVM as my application. I'm running my file-system-crawler in its own JVM (not Solr's). On Tika mailing list, it is suggested to run Tika's code in it's own JVM and invoke it from my file-system-crawler using Runtime.getRuntime().exec(). I fully understand from Alex suggestion and link provided by Erick to use Tika outside Solr. But what about using Tika within the same JVM as my file-system-crawler application or should I be making a system call to invoke another JAR, that runs in its own JVM to extract the raw text? Are there known issues with Tika when used in a long running process? Steve
Re: How is Tika used with Solr
My impulse would be to _not_ run Tika in its own JVM, just catch any exceptions in my code and "do the right thing". I'm not sure I see any real benefit in yet another JVM. FWIW, Erick On Tue, Feb 9, 2016 at 6:22 PM, Allison, Timothy B. wrote: > I have one answer here [0], but I'd be interested to hear what Solr > users/devs/integrators have experienced on this topic. > > [0] > http://mail-archives.apache.org/mod_mbox/tika-user/201602.mbox/%3CCY1PR09MB0795EAED947B53965BC86874C7D70%40CY1PR09MB0795.namprd09.prod.outlook.com%3E > > -Original Message- > From: Steven White [mailto:swhite4...@gmail.com] > Sent: Tuesday, February 09, 2016 6:33 PM > To: solr-user@lucene.apache.org > Subject: Re: How is Tika used with Solr > > Thank you Erick and Alex. > > My main question is with a long running process using Tika in the same JVM as > my application. I'm running my file-system-crawler in its own JVM (not > Solr's). On Tika mailing list, it is suggested to run Tika's code in it's > own JVM and invoke it from my file-system-crawler using > Runtime.getRuntime().exec(). > > I fully understand from Alex suggestion and link provided by Erick to use > Tika outside Solr. But what about using Tika within the same JVM as my > file-system-crawler application or should I be making a system call to invoke > another JAR, that runs in its own JVM to extract the raw text? Are there > known issues with Tika when used in a long running process? > > Steve > >
Re: replicate indexing to second site
This issue might be similar to what Apple presented at the closing keynote at Solr Revolution 2014. I believe they used a queue on each of the site feeding into Solr. The presentation should be online. Regards, Alex. Newsletter and resources for Solr beginners and intermediates: http://www.solr-start.com/ On 10 February 2016 at 07:43, tedsolr wrote: > I have a Solr Cloud cluster (v5.2.1) using a Zookeeper ensemble in my primary > data center. I am now trying to plan for disaster recovery with an available > warm site. I have read (many times) the disaster recovery section in the > Apache ref guide. I suppose I don't fully understand it. > > What I'd like to know is the best way to sync up the existing data, and the > best way to keep that data in sync. Assume that the warm site is an exact > copy (not at the network level) of the production cluster - so the same > servers with the same config. All servers are virtual. The use case is the > active cluster goes down and cannot be repaired, so the warm site would > become the active site. This is a manual process that takes many hours to > accomplish (I just need to fit Solr into this existing process, I can't > change the process :). > > I expect that rsync can be used initially to copy the collection data > folders and the zookeeper data and transaction log folders. So after > verifying Solr/ZK is functional after the install, shut it down and perform > the copy. This may sound slow but my production index size is < 100GB. Is > this approach reasonable? > > So now to keep the warm site in sync, I could use rsync on a scheduled basis > but I assume there's a better way. The ref guide says to send all indexing > requests to the second cluster at the same time they are sent to the active > cluster. I use SolrJ for all requests. So would this entail using a second > CloudSolrClient instance that only knows about the second cluster? Seems > reasonable but I don't want to lengthen the response time for the users. Is > this just a software problem to work out (separate thread)? Or is there a > SolrJ solution (asyc calls)? > > Thanks!! > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/replicate-indexing-to-second-site-tp4256240.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: replicate indexing to second site
On 2/9/2016 5:48 PM, Walter Underwood wrote: > Updating two systems in parallel gets into two-phase commit, instantly. So > you need a persistent pool of updates that both clusters pull from. My indexing system does exactly what I have suggested for tedsolr -- it updates multiple copies of my index in parallel. My data source is MySQL. For each copy, information about the last successful update is separately tracked, so if one of the index copies goes offline, the other stays current. When the offline system comes back, it will be updated from the saved position, and will eventually have the same information as the system that did not go offline. As far as two-phase commit goes, that would make it so that neither copy of the index would stay current if one of them went offline. In most situations I can think of, that's not really very useful. Thanks, Shawn
Re: replicate indexing to second site
I agree. If the system updates synchronously, then you are in two-phase commit land. If you have a persistent store that each index can track, then things are good. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 9, 2016, at 7:37 PM, Shawn Heisey wrote: > > On 2/9/2016 5:48 PM, Walter Underwood wrote: >> Updating two systems in parallel gets into two-phase commit, instantly. So >> you need a persistent pool of updates that both clusters pull from. > > My indexing system does exactly what I have suggested for tedsolr -- it > updates multiple copies of my index in parallel. My data source is MySQL. > > For each copy, information about the last successful update is > separately tracked, so if one of the index copies goes offline, the > other stays current. When the offline system comes back, it will be > updated from the saved position, and will eventually have the same > information as the system that did not go offline. > > As far as two-phase commit goes, that would make it so that neither copy > of the index would stay current if one of them went offline. In most > situations I can think of, that's not really very useful. > > Thanks, > Shawn >
Re: How to use DocValues with TextField
Hello Harry , sorry for delayed reply , I have taken other approach by giving user a different usability as I did not have solution for this. But your option looks great , I will try this out. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-use-DocValues-with-TextField-tp4248647p4256316.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: replicate indexing to second site
Hello Ted. We have a similar requirement to deploy Solr across 2 DCs. In our case, the DCs are connected via fibre optic. We managed to deploy a single SolrCloud cluster across multiple DCs without any major issue (see links below). The whole set-up is described in the following articles: - http://menelic.com/2015/11/21/deploying-solrcloud-across-multiple-data-centers-dc/ - http://menelic.com/2015/12/04/deploying-solrcloud-across-multiple-data-centers-dc-performance/ - http://menelic.com/2015/12/05/allowing-solrj-cloudsolrclient-to-have-preferred-replica-for-query-operations/ - Here is the main issue we had to deal with: http://menelic.com/2015/12/30/zookeeper-shutdown-leader-reason-not-sufficient-followers-synced-only-synced-with-sids/ I believe that if your DCs are well connected, you can have a single SolrCloud cluster spanning across multiple DCs. Arcadius. On 10 February 2016 at 04:15, Walter Underwood wrote: > I agree. If the system updates synchronously, then you are in two-phase > commit land. If you have a persistent store that each index can track, then > things are good. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > > On Feb 9, 2016, at 7:37 PM, Shawn Heisey wrote: > > > > On 2/9/2016 5:48 PM, Walter Underwood wrote: > >> Updating two systems in parallel gets into two-phase commit, instantly. > So you need a persistent pool of updates that both clusters pull from. > > > > My indexing system does exactly what I have suggested for tedsolr -- it > > updates multiple copies of my index in parallel. My data source is > MySQL. > > > > For each copy, information about the last successful update is > > separately tracked, so if one of the index copies goes offline, the > > other stays current. When the offline system comes back, it will be > > updated from the saved position, and will eventually have the same > > information as the system that did not go offline. > > > > As far as two-phase commit goes, that would make it so that neither copy > > of the index would stay current if one of them went offline. In most > > situations I can think of, that's not really very useful. > > > > Thanks, > > Shawn > > > > -- Arcadius Ahouansou Menelic Ltd | Information is Power M: 07908761999 W: www.menelic.com ---
Re: CorruptIndexException during optimize.
Hi, Kindly provide your inputs on the issue. Thanks, Modassar On Mon, Feb 1, 2016 at 12:40 PM, Modassar Ather wrote: > Hi, > > Got following error during optimize of index on 2 nodes of 12 node > cluster. Please let me know if the index can be recovered and how and what > could be the reason? > Total number of nodes: 12 > No replica. > Solr version - 5.4.0 > Java version - 1.7.0_91 (Open JDK 64 bit) > Ubuntu version : Ubuntu 14.04.3 LTS > > 2016-01-31 20:00:31.211 ERROR (qtp1698904557-9710) [c:core s:shard4 > r:core_node3 x:core] o.a.s.h.RequestHandlerBase java.io.IOException: > Invalid vInt detected (too many bits) > at org.apache.lucene.store.DataInput.readVInt(DataInput.java:141) > at > org.apache.lucene.codecs.lucene54.Lucene54DocValuesProducer.readNumericEntry(Lucene54DocValuesProducer.java:355) > at > org.apache.lucene.codecs.lucene54.Lucene54DocValuesProducer.readFields(Lucene54DocValuesProducer.java:243) > at > org.apache.lucene.codecs.lucene54.Lucene54DocValuesProducer.(Lucene54DocValuesProducer.java:122) > at > org.apache.lucene.codecs.lucene54.Lucene54DocValuesFormat.fieldsProducer(Lucene54DocValuesFormat.java:113) > at > org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader.(PerFieldDocValuesFormat.java:268) > at > org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat.fieldsProducer(PerFieldDocValuesFormat.java:358) > at > org.apache.lucene.index.SegmentDocValues.newDocValuesProducer(SegmentDocValues.java:51) > at > org.apache.lucene.index.SegmentDocValues.getDocValuesProducer(SegmentDocValues.java:67) > at > org.apache.lucene.index.SegmentReader.initDocValuesProducer(SegmentReader.java:147) > at org.apache.lucene.index.SegmentReader.(SegmentReader.java:81) > at > org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:145) > at > org.apache.lucene.index.BufferedUpdatesStream$SegmentState.(BufferedUpdatesStream.java:384) > at > org.apache.lucene.index.BufferedUpdatesStream.openSegmentStates(BufferedUpdatesStream.java:416) > at > org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:261) > at > org.apache.lucene.index.IndexWriter.applyAllDeletesAndUpdates(IndexWriter.java:3161) > at > org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:3147) > at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3124) > at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3087) > at > org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1741) > at > org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1721) > at > org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:590) > at > org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:62) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1612) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1589) > at > org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:64) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2073) > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:457) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:222) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:181) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) > at > org.eclipse.jetty.ser
Re: Multi-lingual search
And what does proximity search exactly mean? A proximity search means searching terms with a distance in between them. E.g. Search for a document which has java near 3 words of network. field:"java network"~3 So the above query will match any document having a distance of 3 by its position between java and network. Can i implement proximity search if i use >seperate core per language >field per language >multilingual field that supports all languages. A proximity search is on a field so it does not matter if it is in same/different core. searching for walk word when walking is indexed,should fetch and display the record? It will be included in stemming filter.right? Stemming does bring the word to its root form. So yes if the root word is achieved from the given word it will search. Hope this helps. Best, Modassar On Tue, Feb 9, 2016 at 12:58 PM, vidya wrote: > Hi > Can i implement proximity search if i use > >seperate core per language > >field per language > >multilingual field that supports all languages. > > And what does proximity search exactly mean? > > searching for walk word when walking is indexed,should fetch and display > the > record? > It will be included in stemming filter.right? > > Thanks in advance > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Multi-lingual-search-tp4254398p4256094.html > Sent from the Solr - User mailing list archive at Nabble.com. >