Document boost in Solr
Hi My website www.findbestopensource.com provides search over millions of open source projects. I recently found this issue in my website. Each project will have its description and rank and other set of fields. Rank is set as document boost, so that when user performs a search, high ranked projects should appear first. It was working fine with previous versions of Solr. Some time back I moved to 4.10 and after that I am facing this issue. I added a high ranked project and when I did a search the project is not showing up in the search results. It is showing the results which were added in older versions of Solr. I am using Solr 4.10 and using Solrj library. Regards Aditya
Re: Document boost in Solr
I am not able to understand the debug information. Any specific parameter to look for? Regards Aditya On Sat, Nov 14, 2015 at 6:42 PM, Alexandre Rafalovitch wrote: > Did you try using debug.explain.other and seeing how it is ranked? > On 14 Nov 2015 6:28 am, "Aditya" wrote: > > > Hi > > > > My website www.findbestopensource.com provides search over millions of > > open > > source projects. > > > > I recently found this issue in my website. Each project will have its > > description and rank and other set of fields. Rank is set as document > > boost, so that when user performs a search, high ranked projects should > > appear first. > > > > It was working fine with previous versions of Solr. Some time back I > moved > > to 4.10 and after that I am facing this issue. I added a high ranked > > project and when I did a search the project is not showing up in the > search > > results. It is showing the results which were added in older versions of > > Solr. > > > > I am using Solr 4.10 and using Solrj library. > > > > Regards > > Aditya > > >
Re: Document boost in Solr
Hi I am able to analyse the score using http://explain.solr.pl/ Score of 1st record: 100% 27.12627 sum of the following: 33.47% 9.078974 sum of the following: 19.34% 5.2460585 (MATCH) max of: 19.34% 5.2460585 PRODUCT_TITLE:machin^50.0 - 0.37926888 PRODUCT_CONTENT:machin^1.5 14.13% 3.8329153 (MATCH) max of: 14.13% 3.8329153 PRODUCT_TITLE:learn^50.0 - 0.28544438 PRODUCT_CONTENT:learn^1.5 66.53% 18.047297 (MATCH) max of: 66.53% 18.047297 PRODUCT_TITLE:"machin learn"~10^50.0 - 1.3227714 PRODUCT_CONTENT:"machin learn"~10^1.5 Score of 14th record. This supposed to come in less than 10. 100% 14.135922 sum of the following: 35.52% 5.0206614 sum of the following: 18.74% 2.6496599 (MATCH) max of: 18.74% 2.6496599 PRODUCT_TITLE:machin^50.0 - 0.22348635 PRODUCT_CONTENT:machin^1.5 16.77% 2.3710015 (MATCH) max of: 16.77% 2.3710015 PRODUCT_TITLE:learn^50.0 - 0.18167646 PRODUCT_CONTENT:learn^1.5 64.48% 9.115261 (MATCH) max of: 64.48% 9.115261 PRODUCT_TITLE:"machin learn"~10^50.0 - 0.7794506 PRODUCT_CONTENT:"machin learn"~10^1.5 How can I analyse whether the document boost is applied or not. Regards Aditya On Sat, Nov 14, 2015 at 8:49 PM, Aditya wrote: > I am not able to understand the debug information. > > Any specific parameter to look for? > > Regards > Aditya > > On Sat, Nov 14, 2015 at 6:42 PM, Alexandre Rafalovitch > wrote: > >> Did you try using debug.explain.other and seeing how it is ranked? >> On 14 Nov 2015 6:28 am, "Aditya" wrote: >> >> > Hi >> > >> > My website www.findbestopensource.com provides search over millions of >> > open >> > source projects. >> > >> > I recently found this issue in my website. Each project will have its >> > description and rank and other set of fields. Rank is set as document >> > boost, so that when user performs a search, high ranked projects should >> > appear first. >> > >> > It was working fine with previous versions of Solr. Some time back I >> moved >> > to 4.10 and after that I am facing this issue. I added a high ranked >> > project and when I did a search the project is not showing up in the >> search >> > results. It is showing the results which were added in older versions of >> > Solr. >> > >> > I am using Solr 4.10 and using Solrj library. >> > >> > Regards >> > Aditya >> > >> > >
Solr client
Hi I am aggregating open source solr client libraries across all languages. Below are the links. Very few projects are currently active. Most of them are last updated few years back. Please provide me pointers, if i missed any solr client library. http://www.findbestopensource.com/tagged/solr-client http://www.findbestopensource.com/tagged/solr-gui Regards Ganesh PS: The website http://www.findbestopensource.com search is powered by Solr.
Re: Advise on an architecture with lot of cores
Hi Manoj There are advantages in both the approach. I recently read an article, http://lucidworks.com/blog/podcast-solr-at-scale-at-aol/ . AOL uses Solr and it uses one core per user. Having one core per customer helps you 1. Easily migrate / backup the index 2. Load the core as and when required. When user has signed in, load his index otherwise you don't need to keep his data in memory. 3. Rebuilding data for particular user is easier Cons: 1. If most of users are actively siging in and you need to load most of the cores all the time then it will reduce the search time. 2. Each core will have some set of files and there could be situitation where you will end up in too many files open exception. (We faced this scenario). Having single core for all 1. This reduces the headache of user specific stuff and sees the DB / index as a black box, where you could query for all 2. When the load is more, shard it Cons: 1. Rebuilding index will take more time Regards Aditya www.findbestopensource.com On Tue, Oct 7, 2014 at 8:01 PM, Manoj Bharadwaj wrote: > Hi Toke, > > I don't think I answered your question properly. > > With the current 1 core/customer setup many cores are idle. The redesign we > are working on will move most of our searches to being driven by SOLR vs > database (current split is 90% database, 10% solr). With that change, all > cores will see traffic. > > We have 25G data in the index (across all cores) and they are currently in > a 2 core VM with 32G memory. We are making some changes to the schema and > the analyzers and we see the index size growing by 25% or so due to this. > And to support this we will be moving to a VM with 4 cores and 64G memory. > Hardware as such isn't a constraint. > > Regards > Manoj > > On Tue, Oct 7, 2014 at 8:47 AM, Toke Eskildsen > wrote: > > > On Tue, 2014-10-07 at 14:27 +0200, Manoj Bharadwaj wrote: > > > My team inherited a SOLR setup with an architecture that has a core for > > > every customer. We have a few different types of cores, say "A", "B", > C", > > > and for each one of this there is a core per customer - namely "A1", > > > "A2"..., "B1", "B2"... Overall we have over 600 cores. We don't know > the > > > history behind the current design - the exact reasons why it was done > the > > > way it was done - one probable consideration was to ensure a customer > > data > > > separate from other. > > > > It is not a bad reason. It ensures that ranked search is optimized > > towards each customer's data and makes it easy to manage adding and > > removing customers. > > > > > We want to go to a single core per type architecture, and move on to > > SOLR > > > cloud as well in near future to achieve sharding via the features cloud > > > provides. > > > > If the setup is heavy queried on most of the cores or is there are > > core-spanning searches, collapsing the user-specific cores into fewer > > super-cores might lower hardware requirements a bit. On the other hand, > > it most of the cores are idle most of the time, the 1 core/customer > > setup would be give better utilization of the hardware. > > > > Why do you want to collapse the cores? > > > > - Toke Eskildsen, State and University Library, Denmark > > > > > > >
IOException occured when talking to solr server
Hello all I am getting following error. Could anyone throw me some light on it. I am accessing Solr via Solrj, when there is more load on the server i am getting this error. Is there any way to overcome this situitation. org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost/solr org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://localhost/solr Once this error is encountered, Tomcat is not responding and i need to restart the server. Regards Aditya www.findbestopensource.com
Re: and performance
Hi It will not affect the performance. We are doing this regularly. If you do optimize and search then there may be some impact. Regards Aditya www.findbestopensource.com On Wed, Jul 17, 2013 at 12:52 PM, Ayman Plaha wrote: > Hey Guys, > > I've finally finished my Spring Java application that uses SOLR for > searches and just had performance related question about SOLR. I'm indexing > exactly 1000 *OR* 2000 records every second. Every record having 13 fields > including 'id'. Majority of the fields are solr.StrField (no filters) with > characters ranging from 5 - 50 in length and one field which is text_t > (solr.TextField) which can be of length 100 characters to 2000 characters > and has the following tokenizer and filters > >- PatternTokenizerFactory >- LowerCaseFilterFactory >- SynonymFilterFactory >- SnowballPorterFilterFactory. > > > I'm not using shards. I was hoping when searches get slow I will consider > this or should I consider this now ? > > *Questions:* > >- I'm using SOLR autoCommit (every 15 minutes) with openSearcher set as >true. I'm not using autoSoftCommit because instant availability of the >documents for search is not necessary and I don't want to chew up too > much >memory because I'm consider Cloud hosting. >* >**90 >**true >** >*will this effect the query performance of the client website if the >index grew to 10 million records ? I mean while the commit is happening >does that *effect the performance of queries* and how will this effect >the queries if the index grew to 10 million records ? >- What *hosting specs* should I get ? How much RAM ? Considering my >- client application is very simple that just register users to database >and queries SOLR and displays SOLR results. >- simple batch program adds the 1000 OR 2000 documents to SOLR every >second. > > > I'm hoping to deploy the code next week, if you guys can give me any other > advice I'd really appreciate that. > > Thanks > Ayman >
Re: and performance
Hi It totally depends upon your affordability. If you could afford go for bigger RAM, SSD drive and 64 Bit OS. Benchmark your application, with certain set of docs, how much RAM it takes, Indexing time, Search time etc. Increase the document count and perform benchmarking tasks again. This will provide more information. Everything is directly proportional to number of docs. In my case, I have basic hosting plan and i am happy with the performance. My point is you don't always need fancy hardware. Start with basic and based on the need you could change the plan. Regards Aditya www.findbestopensource.com On Wed, Jul 17, 2013 at 4:55 PM, Ayman Plaha wrote: > Thanks Aditya, can I also please get some advice on hosting. > >- What *hosting specs* should I get ? How much RAM ? Considering my >- client application is very simple that just register users to database >and queries SOLR and displays SOLR results. >- simple batch program adds the 1000 OR 2000 documents to SOLR every >second. > > I'm hoping to deploy the code next week, if you guys can give me any other > advice I'd really appreciate that. > > > On Wed, Jul 17, 2013 at 7:07 PM, Aditya >wrote: > > > Hi > > > > It will not affect the performance. We are doing this regularly. If you > do > > optimize and search then there may be some impact. > > > > Regards > > Aditya > > www.findbestopensource.com > > > > > > > > On Wed, Jul 17, 2013 at 12:52 PM, Ayman Plaha > > wrote: > > > > > Hey Guys, > > > > > > I've finally finished my Spring Java application that uses SOLR for > > > searches and just had performance related question about SOLR. I'm > > indexing > > > exactly 1000 *OR* 2000 records every second. Every record having 13 > > fields > > > including 'id'. Majority of the fields are solr.StrField (no filters) > > with > > > characters ranging from 5 - 50 in length and one field which is text_t > > > (solr.TextField) which can be of length 100 characters to 2000 > characters > > > and has the following tokenizer and filters > > > > > >- PatternTokenizerFactory > > >- LowerCaseFilterFactory > > >- SynonymFilterFactory > > >- SnowballPorterFilterFactory. > > > > > > > > > I'm not using shards. I was hoping when searches get slow I will > consider > > > this or should I consider this now ? > > > > > > *Questions:* > > > > > >- I'm using SOLR autoCommit (every 15 minutes) with openSearcher set > > as > > >true. I'm not using autoSoftCommit because instant availability of > the > > >documents for search is not necessary and I don't want to chew up > too > > > much > > >memory because I'm consider Cloud hosting. > > >* > > >**90 > > >**true > > >** > > >*will this effect the query performance of the client website if the > > >index grew to 10 million records ? I mean while the commit is > > happening > > >does that *effect the performance of queries* and how will this > effect > > >the queries if the index grew to 10 million records ? > > >- What *hosting specs* should I get ? How much RAM ? Considering my > > >- client application is very simple that just register users to > > database > > >and queries SOLR and displays SOLR results. > > >- simple batch program adds the 1000 OR 2000 documents to SOLR every > > >second. > > > > > > > > > I'm hoping to deploy the code next week, if you guys can give me any > > other > > > advice I'd really appreciate that. > > > > > > Thanks > > > Ayman > > > > > >
Re: Auto Indexing in Solr
Hi You could use Java timer. Trigger your DB import, every X minute. Another option, You may aware when your DB is updated. When ever DB gets changed, trigger the request to index the new added data. Regards Aditya www.findbestopensource.com On Thu, Jul 25, 2013 at 11:42 AM, archit2112 wrote: > Hi Im using Solr 4's Data Import Utility to index Oracle 10g XE database. > Im > using full imports as well as delta imports. I want these processes to be > automatic. (Eg: The import processes can be timed or should be executed as > soon any data in the database is modified). I searched for the same online > and I heard people talk about CRON and scripts. However, Im not able to > figure out how to implement it. Can you please provide a tutorial like > explanation? Thanks in advance > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Auto-Indexing-in-Solr-tp4080233.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Duplicate documents based on attribute
You need to store the color field as multi valued stored field. You have to do pagination manually. If you worried, then use database. Have a table with Product Name and Color. You could retrieve data with pagination. Still if you want to achieve it via Solr. Have a separate record for every product and color. ProductName, Color, RecordType. Since Solr is NoSQL, you could have different fields and not all records should have all the fields. You could store different type of document. Filter the record by its type. Regards Aditya www.findbestopensource.com On Thu, Jul 25, 2013 at 11:01 PM, Alexandre Rafalovitch wrote: > Look for the presentations online. You are not the first store to use Solr, > there are some explanations around. Try one from Gilt, but I think there > were more. > > You will want to store data at the lowest meaningful level of search > granularity. So, in your case, it might be ProductVariation (shoes+color). > Some examples I have seen, even store it down to availability level or > price-difference level. Then, you do some post-search normalization either > by doing groups or by doing filtering. > > Solr is not a database, store what you want to find. > > Regards, >Alex. > > Personal website: http://www.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all at > once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) > > > On Thu, Jul 25, 2013 at 12:42 PM, Mark wrote: > > > How would I go about doing something like this. Not sure if this is > > something that can be accomplished on the index side or its something > that > > should be done in our application. > > > > Say we are an online store for shoes and we are selling Product A in red, > > blue and green. Is there a way when we search for Product A all three > > results can be returned even though they are logically the same item > (same > > product in our database). > > > > Thoughts on how this can be accomplished? > > > > Thanks > > > > - M >
Re: processing documents in solr
Hi, The easiest solution would be to have timestamp indexed. Is there any issue in doing re-indexing? If you want to process records in batch then you need a ordered list and a bookmark. You require a field to sort and maintain a counter / last id as bookmark. This is mandatory to solve your problem. If you don't want to re-index, then you need to maintain information related to visited nodes. Have a database / solr core which maintains list of IDs which already processed. Fetch record from Solr, For each record, check the new DB, if the record is already processed. Regards Aditya www.findbestopensource.com On Mon, Jul 29, 2013 at 10:26 AM, Joe Zhang wrote: > Basically, I was thinking about running a range query like Shawn suggested > on the tstamp field, but unfortunately it was not indexed. Range queries > only work on indexed fields, right? > > > On Sun, Jul 28, 2013 at 9:49 PM, Joe Zhang wrote: > > > I've been thinking about tstamp solution int the past few days. but too > > bad, the field is avaialble but not indexed... > > > > I'm not familiar with SolrJ. Again, sounds like SolrJ is providing the > > counter value. If yes, that would be equivalent to an autoincrement id. > I'm > > indexing from Nutch though; don't know how to feed in such counter... > > > > > > On Sun, Jul 28, 2013 at 7:03 AM, Erick Erickson >wrote: > > > >> Why wouldn't a simple timestamp work for the ordering? Although > >> I guess "simple timestamp" isn't really simple if the time settings > >> change. > >> > >> So how about a simple counter field in your documents? Assuming > >> you're indexing from SolrJ, your setup is to query q=*:*&sort=counter > >> desc. > >> Take the counter from the first document returned. Increment for > >> each doc for the life of the indexing run. Now you've got, for all > intents > >> and purposes, an identity field albeit manually maintained. > >> > >> Then use your counter field as Shawn suggests for pulling all the > >> data out. > >> > >> FWIW, > >> Erick > >> > >> On Sun, Jul 28, 2013 at 1:01 AM, Maurizio Cucchiara > >> wrote: > >> > In both cases, for better performance, first I'd load just all the > IDs, > >> > after, during processing I'd load each document. > >> > For what concern the incremental requirement, it should not be > >> difficult to > >> > write an hash function which maps a non-numerical I'd to a value. > >> > On Jul 27, 2013 7:03 AM, "Joe Zhang" wrote: > >> > > >> >> Dear list: > >> >> > >> >> I have an ever-growing solr repository, and I need to process every > >> single > >> >> document to extract statistics. What would be a reasonable process > that > >> >> satifies the following properties: > >> >> > >> >> - Exhaustive: I have to traverse every single document > >> >> - Incremental: in other words, it has to allow me to divide and > >> conquer --- > >> >> if I have processed the first 20k docs, next time I can start with > >> 20001. > >> >> > >> >> A simple "*:*" query would satisfy the 1st but not the 2nd property. > In > >> >> fact, given that the processing will take very long, and the > repository > >> >> keeps growing, it is not even clear that the exhaustiveness is > >> achieved. > >> >> > >> >> I'm running solr 3.6.2 in a single-machine setting; no hadoop > >> capability > >> >> yet. But I guess the same issues still hold even if I have the solr > >> cloud > >> >> environment, right, say in each shard? > >> >> > >> >> Any help would be greatly appreciated. > >> >> > >> >> Joe > >> >> > >> > > > > >
Re: Solr Cloud - How to balance Batch and Queue indexing?
Hi, Do you want 5 replicas? 1 or 2 is enough. If you already have 100 million records, you don't need to do batch indexing. Push it once, Solr has the capability to soft commit every N docs. Use round robin and send documents to different core. When you search, search from all the cores. How you want to setup your servers. Master & Slave OR Fail over. In case of Master & Slave, Index documents in master and do search from replica cores. In case of Fail over, your replica will be used once your main server is failed. Regards Aditya www.findbestopensource.com On Tue, Jul 30, 2013 at 4:56 AM, SolrLover wrote: > I need some advice on the best way to implement Batch indexing with soft > commit / Push indexing (via queue) with soft commit when using SolrCloud. > > *I am trying to figure out a way to: > * > 1. Make the push indexing available almost real time (using soft commit) > without degrading the search / indexing performance. > 2. Ability to not overwrite the existing document (based on listing_id, I > assume I can use overwrite=false flag to disable overwrite). > 3. Not block the push indexing when delta indexing happens (push indexing > happens via UI, user should be able to search for the document pushed via > UI > almost instantaneously). Delta processing might take more time to complete > indexing and I don't want the queue to wait until the batch processing is > complete. > 4. Copy the updated collection for backup. > > *More information on setup: > *We have 100 million records (around 6 stored fields / 12 indexed fields). > We are planning to have 5 cores (each with 20 million documents) with 5 > replicas. > We will be always doing delta batch indexing. > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-Cloud-How-to-balance-Batch-and-Queue-indexing-tp4081169.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: SOLR guidance required
Hi Kamal, It is feasible and that is the correct approach. Add additional fields like salary, experience etc to the index and filter the results. This way you could directly show the results to the user. It is always better to avoid two searches one in solr and other in db. You should maintain search fields in Solr and filter results from Solr. DB is used to maintain extended fields and may be whole document (resume). Refer to range query search http://wiki.apache.org/solr/SolrQuerySyntax Regards Aditya www.findbestopensource.com On Fri, May 10, 2013 at 9:11 AM, Kamal Palei wrote: > Dear SOLR experts > I might be asking a very silly question. As I am new to SOLR kindly guide > me. > > > I have a job site. Using SOLR to search resumes. When a HR user enters some > keywords say JAVA, MySQL etc, I search resume documents using SOLR, > retrieve 100 records and show to user. > > The problem I face is say, I retrieved 100 records, then we do filtering > for experience range, age range, salary range (using mysql query). > Sometimes it so happens that the 100 records I fetch , I do not get a > single record to show to user. When user clicks next link there might be > few records, it looks odd really. > > > I hope there must be some mechanism, by which I can associate salary, > experience, age etc with resume document during indexing. And when > I search for resumes I can give all filters accordingly and can retrieve > 100 records and strait way I can show 100 records to user without doing any > mysql query. Please let me know if this is feasible. If so, kindly give me > some pointer how do I do it. > > Best Regards > Kamal >
Re: Strategy for maintaining De-normalized indexes
Hi Sohail, In my previous mail, I mentioned about storing categories as separate record. You should store and index Category name, MainProduct name as separate record. Index ChildProduct name, MainProduct as separate record. When you want the count, 1. Retrieve the main product name matching the category 2. Retrieve the list of child products matching the main product You may need to two query but it is worth. You don't need to delete and bunch of records. Regards Aditya www.findbestopensource.com On Tue, May 22, 2012 at 5:12 PM, Sohail Aboobaker wrote: > We are still in design phase, so we haven't hit any performance issues. We > do not want to discover performance issues too late during QA :) We would > rather account for any issues during the design phase. > > The refresh rate on fields that we are using from master table will be > rare. May be three or four times in a year. > > Regards, > Sohail >
Re: Is there any relationship between size of index and indexing performance?
Hi Ivan, It depends on number of terms it has to load. If you index less amount of data but store large amount of data then your index size may be big but actual terms may be less. It is not directly proportional. Regards Aditya www.findbestopensource.com On Mon, May 28, 2012 at 3:00 PM, Ivan Hrytsyuk wrote: > Let's assume we are indexing 1GB of data. Does size of index have any > impact on indexing performance? I.e. will we have any difference in case of > empty index vs 50 GB index? > > Thank you, Ivan >
Re: Group.query
Hi You are doing AND search, so you are getting results prod1 and prod2. I guess, you should query only for group1 and another query for group2. Regards Aditya www.findbestopensource.com On Wed, Sep 26, 2012 at 12:26 PM, Peter Kirk wrote: > Hi > > I have "products" which belong to one or more "groups". > Products are documents in Solr, while the groups are fields (eg. > group_1_bool:true). > > For example: > > Prod1 => group1, group2 > Prod2 => group1, group2 > Prod3 => group1 > Prod4 => group2 > > I would like to execute a query which results in the groups with their > products. That is, the result should be something like: > > Group1 => Prod1, Prod2, Prod3 > Group2 => Prod1, Prod2, Prod4 > > How can I do this? > > I've been looking at group.query, but I don't think this is what I want. > > For example, "q=*:*&group.query=group_1_bool:true+AND+group_2_bool:true" > Results in 1 group called "group_1_bool:true AND group_2_bool:true", which > contains prod1 and prod2. > > > Thanks, > Peter > >
Re: Size of logs are high
Can you check your log level? Probably log level of error would suffice for your purpose and it would most certainly reduce your log size(s). On Thu, Feb 11, 2016 at 12:53 PM, kshitij tyagi wrote: > Hi, > I have migrated to solr 5.2 and the size of logs are high. > > Can anyone help me out here how to control this? > -- Aditya Sundaram Software Engineer, Technology team AKR Tech park B Block, B1 047 +91-9844006866
Regarding JSON indexing in SOLR 4.10
Hello everyone I am running SOLR 4.10 on port 8984 by changing the default port in etc/jetty.xml. I am now trying to index all my JSON files to Solr running on 8984. The following is the command curl 'http://localhost:8984/solr/update?commit=true' --data-binary *.json -H 'Content-type:application/json' I am getting the error as following curl: (6) Could not resolve host: 00C3353DDF98B3096D4ADB96E158F0365095762B0E7FD3D0741E046B5CCA0383_Output.json curl: (6) Could not resolve host: 00C3AAD1A19F00A8295662D612022D186A77C18CD14F5F007484C750CF8B108E_Output.json curl: (6) Could not resolve host: 00C449E6FF6F69F07A8648F5DB115855133BFC592E70F45A639DD1AF4E52EC5B_Output.json curl: (6) Could not resolve host: 00C6620B7783C6CE756474748B48F29C06F59474A126D83851753C5474B38A2C_Output.json curl: (6) Could not resolve host: 00C70C0538BFEA03894F23A912E1ECBA2D7559E1F79B93380B922A99802AC764_Output.json I am learning Apache Solr for the first time. Your help will be very much appreciated. Thanks in advance Regards -- Aditya Ramachandra Desai MS Computer Science Graduate Student USC Viterbi School of Engineering Los Angeles, CA 90007 M : +1-415-463-9864 | L : https://www.linkedin.com/in/adityardesai
Re: Regarding JSON indexing in SOLR 4.10
Hi Paul Thanks a lot for your help! I have one small question, I have schema that includes {Keyword,id,currency,geographic_name}. Now I have given id And Whenever I am running your script I am getting an error as 4002Document is missing mandatory uniqueKey field: id400 Can you please share your expertise advice here. Can you please guide me a good source to learn SOLR? I am learning and I would really appreciate if you can help me. Regards On Wed, Mar 30, 2016 at 6:55 AM, Paul Hoffman wrote: > On Tue, Mar 29, 2016 at 11:30:06PM -0700, Aditya Desai wrote: > > I am running SOLR 4.10 on port 8984 by changing the default port in > > etc/jetty.xml. I am now trying to index all my JSON files to Solr running > > on 8984. The following is the command > > > > curl ' > https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8984_solr_update-3Fcommit-3Dtrue&d=CwIBAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=7B13rM0e1iuqzbXK9vK6b5luu5je3SpeGunT1bf-MWA&s=R9qSptMrt9o6C0BXmeQdtm3_bx4fFbABYFja2XUFylA&e= > ' --data-binary *.json > > -H 'Content-type:application/json' > > The wildcard is the problem; your shell is expanding --data-binary > *.json to --data-binary foo.json bar.json baz.json and curl doesn't know > how to download bar.json and baz.json. > > Try this instead: > > for file in *.json; do > curl ' > https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8984_solr_update-3Fcommit-3Dtrue&d=CwIBAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=7B13rM0e1iuqzbXK9vK6b5luu5je3SpeGunT1bf-MWA&s=R9qSptMrt9o6C0BXmeQdtm3_bx4fFbABYFja2XUFylA&e= > ' --data-binary "$file" -H 'Content-type:application/json' > done > > Paul. > > -- > Paul Hoffman > Systems Librarian > Fenway Libraries Online > c/o Wentworth Institute of Technology > 550 Huntington Ave. > Boston, MA 02115 > (617) 442-2384 (FLO main number) > -- Aditya Ramachandra Desai MS Computer Science Graduate Student USC Viterbi School of Engineering Los Angeles, CA 90007 M : +1-415-463-9864 | L : https://www.linkedin.com/in/adityardesai
Re: Regarding JSON indexing in SOLR 4.10
Hi Erick Thanks for your email. Here is the attached sample JSON file. When I indexed the same JSON file with SOLR 5.5 using bin/post it indexed successfully. Also all of my documents were indexed successfully with 5.5 and not with 4.10. Regards On Wed, Mar 30, 2016 at 3:13 PM, Erick Erickson wrote: > The document you're sending to Solr doesn't have an "id" field. The > copyField directive has > nothing to do with it. And you copyField would be copying _from_ the > id field _to_ the > Keyword field, is that what you intended? > > Even if the source and dest fields were reversed, it still wouldn't > work since there is no id > field as indicated by the error. > > Let's see one of the json files please? Are they carefully-formulated > or arbitrary files? If > carefully formulated, just switch > > Best, > Erick > > On Wed, Mar 30, 2016 at 11:26 AM, Aditya Desai wrote: > > Hi Paul > > > > Thanks a lot for your help! I have one small question, I have schema that > > includes {Keyword,id,currency,geographic_name}. Now I have given > > id > > And > > > > Whenever I am running your script I am getting an error as > > > > > > 400 > name="QTime">2Document is > > missing mandatory uniqueKey field: id name="code">400 > > > > > > Can you please share your expertise advice here. Can you please guide me > a > > good source to learn SOLR? > > > > I am learning and I would really appreciate if you can help me. > > > > Regards > > > > > > On Wed, Mar 30, 2016 at 6:55 AM, Paul Hoffman wrote: > > > >> On Tue, Mar 29, 2016 at 11:30:06PM -0700, Aditya Desai wrote: > >> > I am running SOLR 4.10 on port 8984 by changing the default port in > >> > etc/jetty.xml. I am now trying to index all my JSON files to Solr > running > >> > on 8984. The following is the command > >> > > >> > curl ' > >> > https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8984_solr_update-3Fcommit-3Dtrue&d=CwIBAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=7B13rM0e1iuqzbXK9vK6b5luu5je3SpeGunT1bf-MWA&s=R9qSptMrt9o6C0BXmeQdtm3_bx4fFbABYFja2XUFylA&e= > >> ' --data-binary *.json > >> > -H 'Content-type:application/json' > >> > >> The wildcard is the problem; your shell is expanding --data-binary > >> *.json to --data-binary foo.json bar.json baz.json and curl doesn't know > >> how to download bar.json and baz.json. > >> > >> Try this instead: > >> > >> for file in *.json; do > >> curl ' > >> > https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8984_solr_update-3Fcommit-3Dtrue&d=CwIBAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=7B13rM0e1iuqzbXK9vK6b5luu5je3SpeGunT1bf-MWA&s=R9qSptMrt9o6C0BXmeQdtm3_bx4fFbABYFja2XUFylA&e= > >> ' --data-binary "$file" -H 'Content-type:application/json' > >> done > >> > >> Paul. > >> > >> -- > >> Paul Hoffman > >> Systems Librarian > >> Fenway Libraries Online > >> c/o Wentworth Institute of Technology > >> 550 Huntington Ave. > >> Boston, MA 02115 > >> (617) 442-2384 (FLO main number) > >> > > > > > > > > -- > > Aditya Ramachandra Desai > > MS Computer Science Graduate Student > > USC Viterbi School of Engineering > > Los Angeles, CA 90007 > > M : +1-415-463-9864 | L : > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_in_adityardesai&d=CwIFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=ihbpCZYoNmoSqzckKlY5lkESOZXPuLtNIGjnLZCzj78&s=YD-dm-5blmQ07_4vYFoLz6r0NqKRNK1aHtIgHUvc48U&e= > -- Aditya Ramachandra Desai MS Computer Science Graduate Student USC Viterbi School of Engineering Los Angeles, CA 90007 M : +1-415-463-9864 | L : https://www.linkedin.com/in/adityardesai 0A0B69C000E730AE9A1F08E6D7442CC0FB94FC0512624704D06EB48E03C49E16_Output.json Description: application/json
Same origin policy for Apache Solr 5.5
Hello SOLR Experts I am interested to know if SOLR 5.5 supports Same Origin Policy. I am trying to read the data from http://localhost:8984/Solr_1/my/directory1 and display it on UI on http://localhost:8983/Solr_2/my/directory2. http://localhost:8983 has Solr 4.10 running and http://localhost:8984 has Solr 5.5 running.I am using Javascript to make XMLHTTP request, but it is failing with NS_ERROR. So I doubt SOLR supports same origin policy. Is this possible? Any suggestion on how to achieve this? Thanks in advance -- Aditya Ramachandra Desai MS Computer Science Graduate Student USC Viterbi School of Engineering Los Angeles, CA 90007 M : +1-415-463-9864 | L : https://www.linkedin.com/in/adityardesai
Re: Same origin policy for Apache Solr 5.5
Hello Upayavira I am trying to build an application to get the data from independent stand alone SOLR4.10 and then parse that data on global map. So effectively there are two SOLRs, one is independent(4.10) and the other one is having Map APIs(SOLR 5.10 here). I want to give customers the my entire SOLR5.5 package and they just need to put the collections present in any SOLR(here SOLR 4.10). Does this help? On Mon, Apr 4, 2016 at 9:11 AM, Upayavira wrote: > Why would you want to do this? > > On Sun, 3 Apr 2016, at 04:15 AM, Aditya Desai wrote: > > Hello SOLR Experts > > > > I am interested to know if SOLR 5.5 supports Same Origin Policy. I am > > trying to read the data from > https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8984_Solr-5F1_my_directory1&d=CwICAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=YI-5lkYG_20oFkV3OYnMntLrJtskVjWY2d7zJBcsfpo&s=OZNbFMIY0w8PkqNE-rdtJ1_HXYKHVV14O9xOQHeLaTg&e= > > and > > display it on UI on > https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983_Solr-5F2_my_directory2&d=CwICAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=YI-5lkYG_20oFkV3OYnMntLrJtskVjWY2d7zJBcsfpo&s=jbb1jIDNQ5S-5WilIQjNWPWj6odAi1Dw76aUZEeEsR8&e= > . > > > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983&d=CwICAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=YI-5lkYG_20oFkV3OYnMntLrJtskVjWY2d7zJBcsfpo&s=Qw1EoPcAPdhlW4lJ7QH1P2CcL--41WTsAPqBaGuqzmQ&e= > has Solr 4.10 running and > https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8984&d=CwICAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=YI-5lkYG_20oFkV3OYnMntLrJtskVjWY2d7zJBcsfpo&s=VOLsCmWLyGadKpEldVW2r4VDXnfaJsYQGvUZlXAwPF8&e= > has > > Solr 5.5 running.I am using Javascript to make XMLHTTP request, but it is > > failing with NS_ERROR. So I doubt SOLR supports same origin policy. > > > > Is this possible? Any suggestion on how to achieve this? > > > > Thanks in advance > > > > -- > > Aditya Ramachandra Desai > > MS Computer Science Graduate Student > > USC Viterbi School of Engineering > > Los Angeles, CA 90007 > > M : +1-415-463-9864 | L : > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_in_adityardesai&d=CwICAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=YI-5lkYG_20oFkV3OYnMntLrJtskVjWY2d7zJBcsfpo&s=ZYioUPYMkaBFyqZkefbXTCv8WpOtY-i-yf63sTnQMsg&e= > -- Aditya Ramachandra Desai MS Computer Science Graduate Student USC Viterbi School of Engineering Los Angeles, CA 90007 M : +1-415-463-9864 | L : https://www.linkedin.com/in/adityardesai
How to:- Extending Tika within Solr
Hi, I have implemented a new file-type parser for TIka. It parses a custom filetype (*.mx) I would like my Solr instance to use my version of Tika with the mx parser. I found this by a google search https://lucidworks.com/blog/extending-apache-tika-capabilities/ But it seems to be over 5 years old. And the "download project" link is broken Can anybody help me with this? I tried replaceing the tika-* jars within contrib/extraction/lib under solr-root with my compiled tika-* jars. But that didn't work, Solr is still using the old Tika binaries (i.e. without .mx parser). I know that my tika-** jars are working correctly, because I can run them in GUI mode and parse a test .mx file. Thanks! - Aditya
Multilevel grouping?
Does solr support multilevel grouping? I want to group upto 2/3 levels based on different fields i.e 1st group on field one, within which i group by field 2 etc. I am aware of facet.pivot which does the same but retrieves only the count. Is there anyway to get the documents as well along with the count in facet.pivot??? -- Aditya Sundaram
Re: Multilevel grouping?
Thanks Yonik, was looking for exactly that, is there any workaround to achieve that currently? On Tue, Jul 12, 2016 at 5:07 PM, Yonik Seeley wrote: > I started this a while ago, but haven't found the time to finish: > https://issues.apache.org/jira/browse/SOLR-7830 > > -Yonik > > > On Tue, Jul 12, 2016 at 7:29 AM, Aditya Sundaram > wrote: > > Does solr support multilevel grouping? I want to group upto 2/3 levels > > based on different fields i.e 1st group on field one, within which i > group > > by field 2 etc. > > I am aware of facet.pivot which does the same but retrieves only the > count. > > Is there anyway to get the documents as well along with the count in > > facet.pivot??? > > > > -- > > Aditya Sundaram > -- Aditya Sundaram Software Engineer, Technology team AKR Tech park B Block, B1 047 +91-9844006866
Request for adding to Contributors Group
Hello! Please add my email and SolrWiki account in the ContributorsGroup. My Wiki name = AdityaChoudhuri <https://wiki.apache.org/solr/AdityaChoudhuri> Thank you. Aditya
Codec - PostingsFormat - Postings/TermsConsumer - Checkpointed merged segment.
pointed segment, any commit will commit all the uncommitted segments without any flush requirement. So for eg, if we use Solr's Optimize command, after doing a forceMerge() everything is flushed and then a commit is issued. In this commit, the custom FieldConsumer are not invoked and they do not get a chance to commit any uncommitted in-memory information. So we end up in a problem with Optimize command since the merged segment is now committed but our own in-memory merged state is not committed. Thanks in advance for reading this long question. Any thoughts are welcome. If you are aware of some implementation doing partial updates through custom codecs, please do let me know. Kind Regards, Aditya Tripathi.
Solr 4.1 default commit mode
Hi, Can someone please confirm what is the default "commit" type for solrcloud 4.1 As per https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig, looks like softcommit is false ( which means every index update triggers an IO ). Apparently this is application for future solrcloud 4.5. I would appreciate if someone can confirm this for solr 4.1 ? My second question is : Is it ok to have different commit types on different nodes which are part of my solrCloud deployment ? Regards, Aditya -- Regards, -Aditya Sakhuja
solrcloud shards backup/restoration
Hello, I was looking for a good backup / recovery solution for the solrcloud indexes. I am more looking for restoring the indexes from the index snapshot, which can be taken using the replicationHandler's backup command. I am looking for something that works with solrcloud 4.3 eventually, but still relevant if you tested with a previous version. I haven't been successful in have the restored index replicate across the new replicas, after I restart all the nodes, with one node having the restored index. Is restoring the indexes on all the nodes the best way to do it ? -- Regards, -Aditya Sakhuja
data/index naming format
Hello, I am running solr 4.1 for now, and am confused about the structure and naming of the contents of the data dir. I do not see the index.properties being generated on a fresh solr node start either. Can someone clarify when should one expect to see data/index vs. data/index., and the index.properties along with the second version. -- Regards, -Aditya Sakhuja
Re: solrcloud shards backup/restoration
Thanks Shalin and Mark for your responses. I am on the same page about the conventions for taking the backup. However, I am less sure about the restoration of the index. Lets say we have 3 shards across 3 solrcloud servers. 1.> I am assuming we should take a backup from each of the shard leaders to get a complete collection. do you think that will get the complete index ( not worrying about what is not hard committed at the time of backup ). ? 2.> How do we go about restoring the index in a fresh solrcloud cluster ? >From the structure of the snapshot I took, I did not see any replication.properties or index.properties which I see normally on a healthy solrcloud cluster nodes. if I have the snapshot named snapshot.20130905 does the snapshot.20130905/* go into data/index ? Thanks Aditya On Fri, Sep 6, 2013 at 7:28 AM, Mark Miller wrote: > Phone typing. The end should not say "don't hard commit" - it should say > "do a hard commit and take a snapshot". > > Mark > > Sent from my iPhone > > On Sep 6, 2013, at 7:26 AM, Mark Miller wrote: > > > I don't know that it's too bad though - its always been the case that if > you do a backup while indexing, it's just going to get up to the last hard > commit. With SolrCloud that will still be the case. So just make sure you > do a hard commit right before taking the backup - yes, it might miss a few > docs in the tran log, but if you are taking a back up while indexing, you > don't have great precision in any case - you will roughly get a snapshot > for around that time - even without SolrCloud, if you are worried about > precision and getting every update into that backup, you want to stop > indexing and commit first. But if you just want a rough snapshot for around > that time, in both cases you can still just don't hard commit and take a > snapshot. > > > > Mark > > > > Sent from my iPhone > > > > On Sep 6, 2013, at 1:13 AM, Shalin Shekhar Mangar < > shalinman...@gmail.com> wrote: > > > >> The replication handler's backup command was built for pre-SolrCloud. > >> It takes a snapshot of the index but it is unaware of the transaction > >> log which is a key component in SolrCloud. Hence unless you stop > >> updates, commit your changes and then take a backup, you will likely > >> miss some updates. > >> > >> That being said, I'm curious to see how peer sync behaves when you try > >> to restore from a snapshot. When you say that you haven't been > >> successful in restoring, what exactly is the behaviour you observed? > >> > >> On Fri, Sep 6, 2013 at 5:14 AM, Aditya Sakhuja < > aditya.sakh...@gmail.com> wrote: > >>> Hello, > >>> > >>> I was looking for a good backup / recovery solution for the solrcloud > >>> indexes. I am more looking for restoring the indexes from the index > >>> snapshot, which can be taken using the replicationHandler's backup > command. > >>> > >>> I am looking for something that works with solrcloud 4.3 eventually, > but > >>> still relevant if you tested with a previous version. > >>> > >>> I haven't been successful in have the restored index replicate across > the > >>> new replicas, after I restart all the nodes, with one node having the > >>> restored index. > >>> > >>> Is restoring the indexes on all the nodes the best way to do it ? > >>> -- > >>> Regards, > >>> -Aditya Sakhuja > >> > >> > >> > >> -- > >> Regards, > >> Shalin Shekhar Mangar. > -- Regards, -Aditya Sakhuja
ReplicationFactor for solrcloud
Hi - I am trying to set the 3 shards and 3 replicas for my solrcloud deployment with 3 servers, specifying the replicationFactor=3 and numShards=3 when starting the first node. I see each of the servers allocated to 1 shard each.however, do not see 3 replicas allocated on each node. I specifically need to have 3 replicas across 3 servers with 3 shards. Do we think of any reason to not have this configuration ? -- Regards, -Aditya Sakhuja
Re: solrcloud shards backup/restoration
Hi, Sorry for the late followup on this. Let me put in more details here. *The problem:* Cannot successfully restore back the index backed up with '/replication?command=backup'. The backup was generated as * snapshot.mmdd* *My setup and steps:* * * 6 solrcloud instances 7 zookeepers instances Steps: 1.> Take snapshot using *http://host1:8893/solr/replication?command=backup*, on one host only. move *snapshot.mmdd *to some reliable storage. 2.> Stop all 6 solr instances, all 7 zk instances. 3.> Delete ../collectionname/data/* on all solrcloud nodes. ie. deleting the index data completely. 4.> Delete zookeeper/data/version*/* on all zookeeper nodes. 5.> Copy back index from backup to one of the nodes. \> cp *snapshot.mmdd/* *../collectionname/data/index/* 6.> Restart all zk instances. Restart all solrcloud instances. *Outcome:* * * All solr instances are up. However, *num of docs = 0 *for all nodes. Looking at the node where the index was restored, there is a new index.yymmddhhmmss directory being created and index.properties pointing to it. That explains why no documents are reported. How do I have solrcloud pickup data from the index directory on a restart ? Thanks in advance, Aditya On Fri, Sep 6, 2013 at 3:41 PM, Aditya Sakhuja wrote: > Thanks Shalin and Mark for your responses. I am on the same page about the > conventions for taking the backup. However, I am less sure about the > restoration of the index. Lets say we have 3 shards across 3 solrcloud > servers. > > 1.> I am assuming we should take a backup from each of the shard leaders > to get a complete collection. do you think that will get the complete index > ( not worrying about what is not hard committed at the time of backup ). ? > > 2.> How do we go about restoring the index in a fresh solrcloud cluster ? > From the structure of the snapshot I took, I did not see any > replication.properties or index.properties which I see normally on a > healthy solrcloud cluster nodes. > if I have the snapshot named snapshot.20130905 does the > snapshot.20130905/* go into data/index ? > > Thanks > Aditya > > > > On Fri, Sep 6, 2013 at 7:28 AM, Mark Miller wrote: > >> Phone typing. The end should not say "don't hard commit" - it should say >> "do a hard commit and take a snapshot". >> >> Mark >> >> Sent from my iPhone >> >> On Sep 6, 2013, at 7:26 AM, Mark Miller wrote: >> >> > I don't know that it's too bad though - its always been the case that >> if you do a backup while indexing, it's just going to get up to the last >> hard commit. With SolrCloud that will still be the case. So just make sure >> you do a hard commit right before taking the backup - yes, it might miss a >> few docs in the tran log, but if you are taking a back up while indexing, >> you don't have great precision in any case - you will roughly get a >> snapshot for around that time - even without SolrCloud, if you are worried >> about precision and getting every update into that backup, you want to stop >> indexing and commit first. But if you just want a rough snapshot for around >> that time, in both cases you can still just don't hard commit and take a >> snapshot. >> > >> > Mark >> > >> > Sent from my iPhone >> > >> > On Sep 6, 2013, at 1:13 AM, Shalin Shekhar Mangar < >> shalinman...@gmail.com> wrote: >> > >> >> The replication handler's backup command was built for pre-SolrCloud. >> >> It takes a snapshot of the index but it is unaware of the transaction >> >> log which is a key component in SolrCloud. Hence unless you stop >> >> updates, commit your changes and then take a backup, you will likely >> >> miss some updates. >> >> >> >> That being said, I'm curious to see how peer sync behaves when you try >> >> to restore from a snapshot. When you say that you haven't been >> >> successful in restoring, what exactly is the behaviour you observed? >> >> >> >> On Fri, Sep 6, 2013 at 5:14 AM, Aditya Sakhuja < >> aditya.sakh...@gmail.com> wrote: >> >>> Hello, >> >>> >> >>> I was looking for a good backup / recovery solution for the solrcloud >> >>> indexes. I am more looking for restoring the indexes from the index >> >>> snapshot, which can be taken using the replicationHandler's backup >> command. >> >>> >> >>> I am looking for something that works with solrcloud 4.3 eventually, >> but >> >>> still relevant if you tested with a previous version. >> >>> >> >>> I haven't been successful in have the restored index replicate across >> the >> >>> new replicas, after I restart all the nodes, with one node having the >> >>> restored index. >> >>> >> >>> Is restoring the indexes on all the nodes the best way to do it ? >> >>> -- >> >>> Regards, >> >>> -Aditya Sakhuja >> >> >> >> >> >> >> >> -- >> >> Regards, >> >> Shalin Shekhar Mangar. >> > > > > -- > Regards, > -Aditya Sakhuja > -- Regards, -Aditya Sakhuja
Re: solrcloud shards backup/restoration
How does one recover from an index corruption ? That's what I am trying to eventually tackle here. Thanks Aditya On Thursday, September 19, 2013, Aditya Sakhuja wrote: > Hi, > > Sorry for the late followup on this. Let me put in more details here. > > *The problem:* > > Cannot successfully restore back the index backed up with > '/replication?command=backup'. The backup was generated as * > snapshot.mmdd* > > *My setup and steps:* > * > * > 6 solrcloud instances > 7 zookeepers instances > > Steps: > > 1.> Take snapshot using *http://host1:8893/solr/replication?command=backup > *, on one host only. move *snapshot.mmdd *to some reliable storage. > > 2.> Stop all 6 solr instances, all 7 zk instances. > > 3.> Delete ../collectionname/data/* on all solrcloud nodes. ie. deleting > the index data completely. > > 4.> Delete zookeeper/data/version*/* on all zookeeper nodes. > > 5.> Copy back index from backup to one of the nodes. > \> cp *snapshot.mmdd/* *../collectionname/data/index/* > > 6.> Restart all zk instances. Restart all solrcloud instances. > > > *Outcome:* > * > * > All solr instances are up. However, *num of docs = 0 *for all nodes. > Looking at the node where the index was restored, there is a new > index.yymmddhhmmss directory being created and index.properties pointing to > it. That explains why no documents are reported. > > > How do I have solrcloud pickup data from the index directory on a restart > ? > > Thanks in advance, > Aditya > > > > On Fri, Sep 6, 2013 at 3:41 PM, Aditya Sakhuja > wrote: > > Thanks Shalin and Mark for your responses. I am on the same page about the > conventions for taking the backup. However, I am less sure about the > restoration of the index. Lets say we have 3 shards across 3 solrcloud > servers. > > 1.> I am assuming we should take a backup from each of the shard leaders > to get a complete collection. do you think that will get the complete index > ( not worrying about what is not hard committed at the time of backup ). ? > > 2.> How do we go about restoring the index in a fresh solrcloud cluster ? > From the structure of the snapshot I took, I did not see any > replication.properties or index.properties which I see normally on a > healthy solrcloud cluster nodes. > if I have the snapshot named snapshot.20130905 does the > snapshot.20130905/* go into data/index ? > > Thanks > Aditya > > > > On Fri, Sep 6, 2013 at 7:28 AM, Mark Miller wrote: > > Phone typing. The end should not say "don't hard commit" - it should say > "do a hard commit and take a snapshot". > > Mark > > Sent from my iPhone > > On Sep 6, 2013, at 7:26 AM, Mark Miller wrote: > > > I don't know that it's too bad though - its always been the case that if > you do a backup while indexing, it's just going to get up to the last hard > commit. With SolrCloud that will still be the case. So just make sure you > do a hard commit right before taking the backup - yes, it might miss a few > docs in the tran log, but if you are taking a back up while indexing, you > don't have great precision in any case - you will roughly get a snapshot > for around that time - even without SolrCloud, if you are worried about > precision and getting every update into that backup, you want to stop > indexing and commit first. But if you just want a rough snapshot for around > that time, in both cases you can still just don't hard commit and take a > snapshot. > > > > Mark > > > > Sent from my iPhone > > > > On Sep 6, 2013, at 1:13 AM, Shalin Shekhar Mangar < > shalinman...@gmail.com> wrote: > > > >> The replication handler's backup command was built for pre-SolrCloud. > >> It takes a snapshot of the index but it is unaware of the transaction > >> log which is a key component in SolrCloud. Hence unless you stop > >> updates, commit your changes and then take a backup, you will likely > >> miss some updates. > >> > >> That being said, I'm curious to see how peer sync behaves when you try > >> to restore from a snapshot. When you say that you haven't been > >> successful in restoring, what exactly is the behaviour you observed? > >> > >> On Fri, Sep 6, 2013 at 5:14 AM, Aditya Sakhuja < > aditya.sakh...@gmail.com> wrote: > >>> Hello, > >>> > >>> I was looking for a good backup / recovery solution for the solrcloud > >>> indexes. I am more looking for restoring the indexes from the index > >>> sn
Re: ReplicationFactor for solrcloud
Thanks Shalin. We used the maxShardsPerNode=3 as you suggest here. On Thu, Sep 12, 2013 at 4:09 AM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > You must specify maxShardsPerNode=3 for this to happen. By default > maxShardsPerNode defaults to 1 so only one shard is created per node. > > On Thu, Sep 12, 2013 at 3:19 AM, Aditya Sakhuja > wrote: > > Hi - > > > > I am trying to set the 3 shards and 3 replicas for my solrcloud > deployment > > with 3 servers, specifying the replicationFactor=3 and numShards=3 when > > starting the first node. I see each of the servers allocated to 1 shard > > each.however, do not see 3 replicas allocated on each node. > > > > I specifically need to have 3 replicas across 3 servers with 3 shards. Do > > we think of any reason to not have this configuration ? > > > > -- > > Regards, > > -Aditya Sakhuja > > > > -- > Regards, > Shalin Shekhar Mangar. > -- Regards, -Aditya Sakhuja
isolating solrcloud instance from peer updates
Hello all, Is there a way to isolate an active solr-cloud instance from all incoming replication update requests from peer nodes ? -- Regards, -Aditya Sakhuja
Block Join Faceting in Solr 7.2
I'm querying an Index which has two types of child documents (let's call them ChildTypeA and ChildTypeB) I wrap the subqueries for each of these documents in a boolean clause, something like this: *q=+{! parent which=type:parent } +{! parent which=type:parent }* I've been trying to get facet counts on documents of ChildTypeA (rolled up by parent) and I've tried the following approaches - Tried Block Join Faceting using the JSON API i.e. using the unique(_root_) approach. - Enabled docValues on _root_ - *This did not scale well* - Tried using the BlockJoinFacet component. - Had to customize it since it expects that only one *ToParentBlockJoinQuery* clause to be present in the query. - Since I needed facet counts only on ChildTypeA, I changed it to ignore the clause on ChildTypeB - I did not enable docValues on _root_ since it was not mentioned in the documentation. - *This approach did not scale well* I needed advice on whether I could have done anything better in any one of the two approached I've tried so far. Also if there exists some other approached I could try. Would using the uniqueBlock in 7.4 help? (Though this would require me to upgrade my Solr version)