Solr partial date range search
Hi, I am using solr 4.6.1. I have a date field (TrieDateField) in my schema and I am trying to perform partial date range search as given in https://cwiki.apache.org/confluence/display/solr/Working+with+Dates Query = date_field:[2016-01-11 TO NOW] But I am getting, "error": { "msg": "Invalid Date String:'2016-01-11'", "code": 400 } If I use full UTC date, it works good. So is the partial date range search feature introduced in versions greater than 4.6.1? Thanks, Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-partial-date-range-search-tp4253226.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr partial date range search
Thanks Shawn for providing more info. The looks like for supporting partial date range search, I would need to rely on String regex search like fieldName:2016-01* Though this can support part of the functionality, but if I would like to search between start and end date, this might not come good, like fieldName:[2016-01-10 TO 2016-01-21]. This cannot be achieved by indexing the date in String field. Hence as you mentioned, I would need to upgrade my solr to 5.x to fully support this. Thanks again. -Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-partial-date-range-search-tp4253226p4253412.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr partial date range search
Probably, I should not have mentioned, it cannot be achieved, as still we can achieve that by using multiple OR queries with regex matching on that String field, though it doesn't look good :-) -Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-partial-date-range-search-tp4253226p4253415.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr partial date range search
Yes Eric. I am using that full date form based date range query till now and we have a requirement change to search based on partial date ranges. Hence was looking at these options. Kind Regards, -Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-partial-date-range-search-tp4253226p4253488.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr partial date range search
I am actually using one such component to take in the partial dates like 2015-10 and create full UTC dates out of it and query using that. But since I was checking on that wiki about partial date search and since I couldn't find that it is available only from 5.x, I was curious to know if by some way I can make it work in 4.6.1 without need of my custom component. Thanks, Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-partial-date-range-search-tp4253226p4253649.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr partial date range search
Hi Benedetti Alessandro, Thanks for your comments. In our application, Solr search is used in multiple places. With respect to using a middle layer, our online requests go through the search API (Middle layer) which is built on top of solr, whereas the editorial tool, along with few other custom tools directly contact Solr. Hence instead of implementing the partial date search in multiple frontends, I have implemented a SearchComponent which would parse the incoming query, identify data fields out of it, take the partial date value and construct full UTC date out of it and send the query to search the underlying index. Thanks, Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-partial-date-range-search-tp4253226p4253973.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr partial date range search
Hi Jeyaprakash, Thanks for your suggestions. Are you referring to Dataimporthandler properties, to configure this in data-config.xml? Since I am currently referring to partial date search in my online queries, I am not sure whether this will help achieve that. Can you please explain bit more? Thanks, Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-partial-date-range-search-tp4253226p4253974.html Sent from the Solr - User mailing list archive at Nabble.com.
Mysql data import issue
Hi, I am using Solr 4.6.1 and I am trying to import my data from mysql to solr. In mysql, I have a table with columns, id, legacyid, otherfields... In solr I have columns : id, other fields. I want to map the legacyid field in my mysql table with Solr'r id column and skip the "id" field of mysql while doing import. Hence I have a mapping, But still I get one to one mapping of my mysql id field to solr's id field. Can you please let me know how to prevent this from happening? I even mapped id field of mysql to empty solr field. But still I get mysql id field to solr id field mapping. Please let me know how to prevent this from happening. Thanks, Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Mysql-data-import-issue-tp4253998.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Mysql data import issue
Thanks Gora for your Suggestions. Since my table contains lot of fields, and all the other fields have same name mapping for solr and mysql, I thought I can give mapping for the one which is different and leave remaining as is. But is not selecting the id field in the returned query the only way to achieve that? Thanks, Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Mysql-data-import-issue-tp4253998p4254210.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr date retrieve back UTC
Hi, I am having a date field in my solr schema and I am indexing a proper UTC date to that field. If I am directly querying Solr, I am able to see the field with UTC time in that in the JSON response. But when I use SolrJ and get it as object, I am seeing that the UTC date is of type Date and I am not able to retrieve back the UTC date from it and I get only long timestamp from that object. I also see a private variable in that Date class called as cDate which has what I want (Date in UTC format). But I am not able to get the UTC value out of that variable. Is there any better ways to get UTC timestamp out of that field? Thanks, Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-date-retrieve-back-UTC-tp4187449.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr date retrieve back UTC
Thanks Chris for your quick reply. As you said, I need to do some conversion to get the UTC back which I thought might not be required as already the cDate field in that Date class is having the UTC date. The toString() doesn't actually give me timestamp in UTC format. It gives, Mon Sep 15 12:52:08 PDT 2014 Thanks, V.Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-date-retrieve-back-UTC-tp4187449p4187456.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr date retrieve back UTC
Thanks Chris for additional info. Thanks, Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-date-retrieve-back-UTC-tp4187449p4187503.html Sent from the Solr - User mailing list archive at Nabble.com.
Best way to dump out entire solr content?
Hi All, I am having a solr cloud cluster of 20 nodes with each node having close to 20 Million records and total index size is around 400GB ( 20GB per node X 20 nodes ). I am trying to know the best way to dump out the entire solr data in say CSV format. I use successive queries by incrementing the start param with 2000 and keeping the rows as 2000 and hitting each individual servers using distrib=false so that I don't overload the top level server and causing any timeouts between top level and lower level servers. I am getting response from solr very quickly when the start param is in lower millions < 2 millions. As the start param grows towards 16 million, solr takes almost 2 to 3 minutes to return back those 2000 records for a single query. I assume this is because of skipping all the lower level index positions to get to that start index of > 16 millions and then provide the results. Is there any better way to do this? I saw cursor feature in solr pagination Wiki but it is mentioned that it is for sort on a unique field. Would it make sense for my use this to sort on my solr key field(Solr unique key field) with rows as 2000 and keep on using the nextCursorMark to dump out all the documents in csv format? Thanks, Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-dump-out-entire-solr-content-tp4192734.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best way to dump out entire solr content?
Thanks Alex for quick response. I wanted to avoid reading the lucene index to prevent complications of merging deleted info. Also I would like to do this on very frequent basis as well like once in two or three days. I am wondering if the issues that I faced while scraping the index towards higher order of millions will get resolved with Cursor. Do you think using cursor to scrap solr with sort on unique key field is better than not using it and does it not do the same skip operations and take more time as without using cursor? Thanks, Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-dump-out-entire-solr-content-tp4192734p4192745.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best way to dump out entire solr content?
Thanks Alex for explanation. Actually since I am scraping all the contents from Solr, I am doing a generic query of *:* So I think it should not take so much time right? But as you say probably the internal skips using the cursor might be more efficient than the skip done with increasing the start, I will use the cursors. Kindly correct me if my understanding is not right. Thanks, Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-dump-out-entire-solr-content-tp4192734p4192750.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best way to dump out entire solr content?
Great! Thanks for providing more info Toke Eskildsen Thanks, Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-dump-out-entire-solr-content-tp4192734p4192892.html Sent from the Solr - User mailing list archive at Nabble.com.
solr index design for this use case?
Hi All, Consider this scenario : I am having around 100K content and I want to launch 5 sites with that content. For example, around 50K content for site1, 40K content for site2, 30K for site3, 20K for site4, and 10K for site5. As seen from this example, these sites have few overlapping content and non overlapping content as well. In this case say if a content page is present in all site1, site2 and site3 out of 50 fields per content page, say 30 fields remain common between site1 and site2, 25 fields common between site1 and site3 and 20 fields between site2 and site3, in this case, my aim is to prevent duplication as much as possible without getting too much reduction in QPS. Hence I consider the following options, Option 1: Just maintain individual copy of duplicated content for each site and overwrite site specific information while indexing for those sites. Pros: Better QPS as no query time joins are involved. Cons: Duplication of common fields for common content across sites. Option 2: Maintain just a single copy of common fields per content across all overlapping sites and separate site specific information for that content and do a merge while serving using joins. In this approach, for joins I looked at Block join provided by solr and looks like it may not be a good fit for my case as if one site specific info changes, I don't want to index the entire block containing other sites as well. Is there any better way to tackle this making sure we are not occupying so much space and at the same time not reducing the QPS too much? Thanks, Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/solr-index-design-for-this-use-case-tp420.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr index design for this use case?
Hi Eric, Thanks for your response. I was planning to do the same, to store the data in a single collection with site parameter differentiating duplicated content for different sites. But my use case is that in future the content would run into millions and potentially there could be large number of sites as well. Hence I am trying to arrive at a solution that can be best of both worlds. Is there any efficient way to store preventing duplication, merging the site specific and common content, and still be able to edit record for individual site without having to index the entire block. Thanks, Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/solr-index-design-for-this-use-case-tp420p4200065.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr mysql Json import
Hi All, I am having an use case where I want to index a json field from mysql into solr. The Json field will contain entries as key value pairs. The Json can be nested, but I want to index only the first level field value pairs of Jsons into solr keys and nested levels can be present as value of corresponding field in solr. Eg) { "k1":"value1", "k2":"value2", "k3":{ "f1":"fv1", "f2":"fv2" }, "k4":[ "v1", "v2", "v3", "v4" ] } The above Json is present as value for a mysql field. Along with this field, I have few other fields in mysql like id, timestamp etc. Considering this, can I import this data from mysql to solr and map fields like, Mysql_fields => solr_fields id => id timestamp => timestamp k1 => k1 k2 => k2 k3 => k3 ( Type "text" field and it will contain this Json k4 => k4 ( Type "text" with multivalued=true)? Can this be achieved? I have used simplistic data import to import data from mysql to solr, not for these complicated use cases. Also I would like to use that timestamp field for constructing my delta import query. Thanks, Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-mysql-Json-import-tp4278686.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr mysql Json import
Looks like it is available through http post request as given in https://lucidworks.com/blog/2014/08/12/indexing-custom-json-data/ Hence I assume corresponding json data import from mysql should also be available. Can someone point me to related docs? Thanks, Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-mysql-Json-import-tp4278686p4278875.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr And query
Hi All, This might be a simple question. I tried to find a solution, but not exactly finding what I want. I have the following fields f1, f2 and f3. I want to do an AND query in these fields. If I want to search for single word in these 3 fields, then I am facing no problem. I can simply construct query like, q=f1:word1 AND f2:word2 AND f3:word3 . But if I want to search for more than one word, then I am required to enclose it in double quotes. eg) q=f1:"word1 word2" AND f2:word3 AND f3:word4 But the problem I am facing with this approach is, word1 and word2 has to appear in the same order in field f1.But for my use case, I don't require it. It can be present anywhere in that field and I want same scoring irrespective of where it is present. In simpler words, I just want basic term matching of those words in that field. Hence I tried the following solutions, 1. Using a slop query: I constructed query like q=f1:"word1 word2"~1000 AND f2:word3 AND f3:word4 I read that it puts more load on CPU as it finds the position difference between those words and creates the score. I just want a plain term matching and I don't require score to vary based on distance. 2. Using Filter query: I constructed query like, q=word1 word2&df=f1&fq=f2:word3&f3:word4 The score is way less as it is not using filter query terms for scoring. Also since I enabled filter cache, I don't want these filter queries to be cached. Hence I don't want to use filter queries for these fields. 3. Using AND operator and df: I constructed query like, q=word1 word2 AND f2:word3 AND f3:word4&df=f1. This works perfectly fine as word1 and word2 are searched in f1 and other AND queries are also working fine. But now, If I want to search for 2 words in f2 as well, then I am not sure how to construct query. eg) q=word1 word2 AND f2:word3 word4 AND f3:word5 word6&df=f1 .Here, word4 and word6 will be searched against field f1. But I want them to searched on f2 and f3 respectively. Hence please help me with this. Thanks, Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-And-query-tp4166685.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr And query
Actually I found out how to form the query. I just need to use, q=f1:(word1 word2) AND f2:(word3 word4) AND f3:(word5 word6) Thanks, V.Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-And-query-tp4166685p4166744.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr And query
Thanks Eric. I tried q.op=AND and noticed that it is equivalent to specifying, q=f1:"word1 word2" AND f2:"word3 word4" AND f3:"word5 word6" -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-And-query-tp4166685p4166760.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr And query
Yes Erick. Correctly pointed. Thanks, Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-And-query-tp4166685p4166789.html Sent from the Solr - User mailing list archive at Nabble.com.
Solrcloud open new searcher not happening in slave for deletebyID
Hi All, I am using Solrcloud 4.6.1 In that if I use CloudSolrServer to add a record to solr, then I see the following commit update command in both master and in slave node : 2015-01-27 15:20:23,625 INFO org.apache.solr.update.UpdateHandler: start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} I am also setting the updateRequest.setCommitWithin(5000); Here as noticed, the openSearcher=true and hence after 5 seconds, I am able to see the record in index in both slave and in master. Now if I trigger another UpdateRequest with only deleteById set and no add documents to Solr, with the same commit within time, then in the master log I see, 2015-01-27 15:21:46,389 INFO org.apache.solr.update.UpdateHandler: start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} and in the slave log I see, 2015-01-27 15:21:56,393 INFO org.apache.solr.update.UpdateHandler: start commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} Here as noticed, the master is having openSearcher=true and slave is having openSearcher=false. This causes inconsistency in the results as master shows that the record is deleted and slave still has the record. After digging through the code a bit, I think this is probably happening in CommitTracker where the openSearcher might be false while creating the CommitUpdateCommand. Can you advise if there is any ticket created to address this issue or can I create one? Also is there any workaround for this till the bug is fixed than to set commit within duration in server to a lower value? Thanks, V.Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Solrcloud-open-new-searcher-not-happening-in-slave-for-deletebyID-tp4182439.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solrcloud open new searcher not happening in slave for deletebyID
Thanks Shawn. Not sure whether I will be able to test it out with 4.10.3. I will try the workarounds and update. Thanks, V.Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Solrcloud-open-new-searcher-not-happening-in-slave-for-deletebyID-tp4182439p4182757.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solrcloud open new searcher not happening in slave for deletebyID
I tried with deleteByQuery but still in replicas, new searcher is not opened. Hence I configured solr to issue soft commit every one second. Didn't try this with latest solr 4.10.3 Thanks, V.Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Solrcloud-open-new-searcher-not-happening-in-slave-for-deletebyID-tp4182439p4185192.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solrcloud open new searcher not happening in slave for deletebyID
Thanks Anshum for additional info. - Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Solrcloud-open-new-searcher-not-happening-in-slave-for-deletebyID-tp4182439p4185196.html Sent from the Solr - User mailing list archive at Nabble.com.