Solr partial date range search

2016-01-25 Thread vsriram30
Hi, 
I am using solr 4.6.1. I have a date field (TrieDateField) in my schema and
I am trying to perform partial date range search as given in
https://cwiki.apache.org/confluence/display/solr/Working+with+Dates 
Query = date_field:[2016-01-11 TO NOW]
But I am getting,
"error": {
"msg": "Invalid Date String:'2016-01-11'",
"code": 400
  }

If I use full UTC date, it works good. So is the partial date range search
feature introduced in versions greater than 4.6.1?

Thanks,
Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-partial-date-range-search-tp4253226.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr partial date range search

2016-01-26 Thread vsriram30
Thanks Shawn for providing more info. The looks like for supporting partial
date range search, I would need to rely on String regex search like
fieldName:2016-01*

Though this can support part of the functionality, but if I would like to
search between start and end date, this might not come good, like
fieldName:[2016-01-10 TO 2016-01-21]. This cannot be achieved by indexing
the date in String field.

Hence as you mentioned, I would need to upgrade my solr to 5.x to fully
support this. Thanks again.

-Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-partial-date-range-search-tp4253226p4253412.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr partial date range search

2016-01-26 Thread vsriram30
Probably, I should not have mentioned, it cannot be achieved, as still we can
achieve that by using multiple OR queries with regex matching on that String
field, though it doesn't look good :-)

-Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-partial-date-range-search-tp4253226p4253415.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr partial date range search

2016-01-26 Thread vsriram30
Yes Eric. I am using that full date form based date range query till now and
we have a requirement change to search based on partial date ranges. Hence
was looking at these options.

Kind Regards,
-Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-partial-date-range-search-tp4253226p4253488.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr partial date range search

2016-01-27 Thread vsriram30
I am actually using one such component to take in the partial dates like
2015-10 and create full UTC dates out of it and query using that. But since
I was checking on that wiki about partial date search and since I couldn't
find that it is available only from 5.x, I was curious to know if by some
way I can make it work in 4.6.1 without need of my custom component.

Thanks,
Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-partial-date-range-search-tp4253226p4253649.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr partial date range search

2016-01-28 Thread vsriram30
Hi Benedetti Alessandro,

Thanks for your comments. In our application, Solr search is used in
multiple places. With respect to using a middle layer, our online requests
go through the search API (Middle layer) which is built on top of solr,
whereas the editorial tool, along with few other custom tools directly
contact Solr. 

Hence instead of implementing the partial date search in multiple frontends,
I have implemented a SearchComponent which would parse the incoming query,
identify data fields out of it, take the partial date value and construct
full UTC date out of it and send the query to search the underlying index.

Thanks,
Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-partial-date-range-search-tp4253226p4253973.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr partial date range search

2016-01-28 Thread vsriram30
Hi Jeyaprakash,

Thanks for your suggestions. Are you referring to Dataimporthandler
properties, to configure this in data-config.xml? Since I am currently
referring to partial date search in my online queries, I am not sure whether
this will help achieve that. Can you please explain bit more?

Thanks,
Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-partial-date-range-search-tp4253226p4253974.html
Sent from the Solr - User mailing list archive at Nabble.com.


Mysql data import issue

2016-01-28 Thread vsriram30
Hi,
I am using Solr 4.6.1 and I am trying to import my data from mysql to solr.

In mysql, I have a table with columns,
id, legacyid, otherfields...

In solr I have columns : id, other fields. I want to map the legacyid field
in my mysql table with Solr'r id column and skip the "id" field of mysql
while doing import. Hence I have a mapping,


But still I get one to one mapping of my mysql id field to solr's id field.
Can you please let me know how to prevent this from happening?

I even mapped id field of mysql to empty solr field. 

But still I get mysql id field to solr id field mapping. Please let me know
how to prevent this from happening. 

Thanks,
Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Mysql-data-import-issue-tp4253998.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Mysql data import issue

2016-01-29 Thread vsriram30
Thanks Gora for your Suggestions. Since my table contains lot of fields, and
all the other fields have same name mapping for solr and mysql, I thought I
can give mapping for the one which is different and leave remaining as is.
But is not selecting the id field in the returned query the only way to
achieve that?

Thanks,
Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Mysql-data-import-issue-tp4253998p4254210.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr date retrieve back UTC

2015-02-19 Thread vsriram30
Hi,

I am having a date field in my solr schema and I am indexing a proper UTC
date to that field. If I am directly querying Solr, I am able to see the
field with UTC time in that in the JSON response.

But when I use SolrJ and get it as object, I am seeing that the UTC date is
of type Date and I am not able to retrieve back the UTC date from it and I
get only long timestamp from that object.

I also see a private variable in that Date class called as cDate which has
what I want (Date in UTC format). But I am not able to get the UTC value out
of that variable. Is there any better ways to get UTC timestamp out of that
field?

Thanks,
Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-date-retrieve-back-UTC-tp4187449.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr date retrieve back UTC

2015-02-19 Thread vsriram30
Thanks Chris for your quick reply. As you said, I need to do some conversion
to get the UTC back which I thought might not be required as already the
cDate field in that Date class is having the UTC date.

The toString() doesn't actually give me timestamp in UTC format. It gives,

Mon Sep 15 12:52:08 PDT 2014

Thanks,
V.Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-date-retrieve-back-UTC-tp4187449p4187456.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr date retrieve back UTC

2015-02-19 Thread vsriram30
Thanks Chris for additional info.

Thanks,
Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-date-retrieve-back-UTC-tp4187449p4187503.html
Sent from the Solr - User mailing list archive at Nabble.com.


Best way to dump out entire solr content?

2015-03-12 Thread vsriram30
Hi All,

I am having a solr cloud cluster of 20 nodes with each node having close to
20 Million records and total index size is around 400GB ( 20GB per node X 20
nodes ). I am trying to know the best way to dump out the entire solr data
in say CSV format. 

I use successive queries by incrementing the start param with 2000 and
keeping the rows as 2000 and hitting each individual servers using
distrib=false so that I don't overload the top level server and causing any
timeouts between top level and lower level servers. I am getting response
from solr very quickly when the start param is in lower millions < 2
millions. As the start param grows towards 16 million, solr takes almost 2
to 3 minutes to return back those 2000 records for a single query. I assume
this is because of skipping all the lower level index positions to get to
that start index of > 16 millions and then provide the results.

Is there any better way to do this? I saw cursor feature in solr pagination
Wiki but it is mentioned that it is for sort on a unique field. Would it
make sense for my use this to sort on my solr key field(Solr unique key
field) with rows as 2000 and keep on using the nextCursorMark to dump out
all the documents in csv format?

Thanks,
Sriram




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-dump-out-entire-solr-content-tp4192734.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Best way to dump out entire solr content?

2015-03-12 Thread vsriram30
Thanks Alex for quick response. I wanted to avoid reading the lucene index to
prevent complications of merging deleted info. Also I would like to do this
on very frequent basis as well like once in two or three days.

I am wondering if the issues that I faced while scraping the index towards
higher order of millions will get resolved with Cursor. Do you think using
cursor to scrap solr with sort on unique key field is better than not using
it and does it not do the same skip operations and take more time as without
using cursor?

Thanks,
Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-dump-out-entire-solr-content-tp4192734p4192745.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Best way to dump out entire solr content?

2015-03-12 Thread vsriram30
Thanks Alex for explanation. Actually since I am scraping all the contents
from Solr, I am doing a generic query of *:* So I think it should not take
so much time right?

But as you say probably the internal skips using the cursor might be more
efficient than the skip done with increasing the start, I will use the
cursors. Kindly correct me if my understanding is not right.

Thanks,
Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-dump-out-entire-solr-content-tp4192734p4192750.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Best way to dump out entire solr content?

2015-03-13 Thread vsriram30
Great! Thanks for providing more info Toke Eskildsen

Thanks,
Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-dump-out-entire-solr-content-tp4192734p4192892.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr index design for this use case?

2015-04-15 Thread vsriram30
Hi All,

Consider this scenario : I am having around 100K content and I want to
launch 5 sites with that content. For example, around 50K content for site1,
40K content for site2, 30K for site3, 20K for site4, and 10K for site5.

As seen from this example, these sites have few overlapping content and non
overlapping content as well. In this case say if a content page is present
in all site1, site2 and site3 out of 50 fields per content page, say 30
fields remain common between site1 and site2, 25 fields common between site1
and site3 and 20 fields between site2 and site3, in this case, my aim is to
prevent duplication as much as possible without getting too much reduction
in QPS. Hence I consider the following options,

Option 1: Just maintain individual copy of duplicated content for each site
and overwrite site specific information while indexing for those sites.
Pros:
Better QPS as no query time joins are involved.
Cons:
Duplication of common fields for common content across sites.

Option 2: Maintain just a single copy of common fields per content across
all overlapping sites and separate site specific information for that
content and do a merge while serving using joins.
In this approach, for joins I looked at Block join provided by solr and
looks like it may not be a good fit for my case as if one site specific info
changes, I don't want to index the entire block containing other sites as
well.
Is there any better way to tackle this making sure we are not occupying so
much space and at the same time not reducing the QPS too much?

Thanks,
Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-index-design-for-this-use-case-tp420.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr index design for this use case?

2015-04-15 Thread vsriram30
Hi Eric,

Thanks for your response. I was planning to do the same, to store the data
in a single collection with site parameter differentiating duplicated
content for different sites. But my use case is that in future the content
would run into millions and potentially there could be large number of sites
as well. Hence I am trying to arrive at a solution that can be best of both
worlds. Is there any efficient way to store preventing duplication, merging
the site specific and common content, and still be able to edit record for
individual site without having to index the entire block.

Thanks,
Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-index-design-for-this-use-case-tp420p4200065.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr mysql Json import

2016-05-23 Thread vsriram30
Hi All,

I am having an use case where I want to index a json field from mysql into
solr. The Json field will contain entries as key value pairs. The Json can
be nested, but I want to index only the first level field value pairs of
Jsons into solr keys and nested levels can be present as value of
corresponding field in solr. 

Eg) 

{  
   "k1":"value1",
   "k2":"value2",
   "k3":{  
  "f1":"fv1",
  "f2":"fv2"
   },
   "k4":[  
  "v1",
  "v2",
  "v3",
  "v4"
   ]
}

The above Json is present as value for a mysql field. Along with this field,
I have few other fields in mysql like id, timestamp etc.

Considering this, can I import this data from mysql to solr and map fields
like,

Mysql_fields => solr_fields
id => id
timestamp => timestamp
k1 => k1
k2 => k2
k3 => k3 ( Type "text" field and it will contain this Json
k4 => k4 ( Type "text" with multivalued=true)?

Can this be achieved? I have used simplistic data import to import data from
mysql to solr, not for these complicated use cases. Also I would like to use
that timestamp field for constructing my delta import query.

Thanks,
Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-mysql-Json-import-tp4278686.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr mysql Json import

2016-05-24 Thread vsriram30
Looks like it is available through http post request as given in
https://lucidworks.com/blog/2014/08/12/indexing-custom-json-data/ 

Hence I assume corresponding json data import from mysql should also be
available. Can someone point me to related docs?

Thanks,
Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-mysql-Json-import-tp4278686p4278875.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr And query

2014-10-30 Thread vsriram30
Hi All,

This might be a simple question. I tried to find a solution, but not exactly
finding what I want. I have the following fields f1, f2 and f3. I want to do
an AND query in these fields. 

If I want to search for single word in these 3 fields, then I am facing no
problem. I can simply construct query like, q=f1:word1 AND f2:word2 AND
f3:word3 . But if I want to search for more than one word, then I am
required to enclose it in double quotes. eg) q=f1:"word1 word2" AND f2:word3
AND f3:word4 But the problem I am facing with this approach is, word1 and
word2 has to appear in the same order in field f1.But for my use case, I
don't require it. It can be present anywhere in that field and I want same
scoring irrespective of where it is present. In simpler words, I just want
basic term matching of those words in that field.

Hence I tried the following solutions,

1. Using a slop query:

I constructed query like q=f1:"word1 word2"~1000 AND f2:word3 AND f3:word4
I read that it puts more load on CPU as it finds the position difference
between those words and creates the score. I just want a plain term matching
and I don't require score to vary based on distance.

2. Using Filter query:

I constructed query like, q=word1 word2&df=f1&fq=f2:word3&f3:word4 The score
is way less as it is not using filter query terms for scoring. Also since I
enabled filter cache, I don't want these filter queries to be cached. Hence
I don't want to use filter queries for these fields.

3. Using AND operator and df:

I constructed query like, q=word1 word2 AND f2:word3 AND f3:word4&df=f1.
This works perfectly fine as word1 and word2 are searched in f1 and other
AND queries are also working fine. But now, If I want to search for 2 words
in f2 as well, then I am not sure how to construct query.

eg) q=word1 word2 AND f2:word3 word4 AND f3:word5 word6&df=f1 .Here, word4
and word6 will be searched against field f1. But I want them to searched on
f2 and f3 respectively. Hence please help me with this.

Thanks,
Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-And-query-tp4166685.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr And query

2014-10-30 Thread vsriram30
Actually I found out how to form the query. I just need to use,

q=f1:(word1 word2) AND f2:(word3 word4) AND f3:(word5 word6)

Thanks,
V.Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-And-query-tp4166685p4166744.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr And query

2014-10-30 Thread vsriram30
Thanks Eric. I tried q.op=AND and noticed that it is equivalent to
specifying,
q=f1:"word1 word2" AND f2:"word3 word4" AND f3:"word5 word6"



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-And-query-tp4166685p4166760.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr And query

2014-10-30 Thread vsriram30
Yes Erick. Correctly pointed.

Thanks,
Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-And-query-tp4166685p4166789.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solrcloud open new searcher not happening in slave for deletebyID

2015-01-27 Thread vsriram30
Hi All,

I am using Solrcloud 4.6.1 In that if I use CloudSolrServer to add a record
to solr, then I see the following commit update command in both master and
in slave node :

2015-01-27 15:20:23,625 INFO org.apache.solr.update.UpdateHandler: start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}

I am also setting the updateRequest.setCommitWithin(5000);

Here as noticed, the openSearcher=true and hence after 5 seconds, I am able
to see the record in index in both slave and in master.

Now if I trigger another UpdateRequest with only deleteById set and no add
documents to Solr, with the same commit within time, then 

in the master log I see,

2015-01-27 15:21:46,389 INFO org.apache.solr.update.UpdateHandler: start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}

and in the slave log I see,
2015-01-27 15:21:56,393 INFO org.apache.solr.update.UpdateHandler: start
commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

Here as noticed, the master is having openSearcher=true and slave is having
openSearcher=false. This causes inconsistency in the results as master shows
that the record is deleted and slave still has the record.

After digging through the code a bit, I think this is probably happening in
CommitTracker where the openSearcher might be false while creating the
CommitUpdateCommand.

Can you advise if there is any ticket created to address this issue or can I
create one? Also is there any workaround for this till the bug is fixed than
to set commit within duration in server to a lower value?

Thanks,
V.Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrcloud-open-new-searcher-not-happening-in-slave-for-deletebyID-tp4182439.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solrcloud open new searcher not happening in slave for deletebyID

2015-01-28 Thread vsriram30
Thanks Shawn.  Not sure whether I will be able to test it out with 4.10.3.  I
will try the workarounds and update.

Thanks,
V.Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrcloud-open-new-searcher-not-happening-in-slave-for-deletebyID-tp4182439p4182757.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solrcloud open new searcher not happening in slave for deletebyID

2015-02-09 Thread vsriram30
I tried with deleteByQuery but still in replicas, new searcher is not opened.
Hence I configured solr to issue soft commit every one second. Didn't try
this with latest solr 4.10.3

Thanks,
V.Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrcloud-open-new-searcher-not-happening-in-slave-for-deletebyID-tp4182439p4185192.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solrcloud open new searcher not happening in slave for deletebyID

2015-02-09 Thread vsriram30
Thanks Anshum for additional info.

- Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrcloud-open-new-searcher-not-happening-in-slave-for-deletebyID-tp4182439p4185196.html
Sent from the Solr - User mailing list archive at Nabble.com.