How to show just the parent domains from results in Solr
hi All, I've indexed documents in my Solr 4.0 index, with fields like URL, page_content etc. Now when I run a search query, against the page_content I get a lot of urls . And say, if I in total 15 URL domains, and under these 15 domains I've all the pages indexed in SOLR. Is there a way in which, I can just get the parent URLs for search results instead of getting all the urls. For example, say searching for "abc" returns: www.aa.com/11.html www.aa.com/12.htmlwww.aa.com/13.html www.bb.com/15.htmlwww.bb.com/18.html I want the results to be like this:www.aa.comwww.bb.com Is there a way in SOLR, through which I can achieve this. I've tried FieldCollapsing[ https://wiki.apache.org/solr/FieldCollapsing ] but either its not the right solution or I'm not able to use it properly. Could someone help me find the solution to the above problem. Thanks in advance. Regards, KK
Segments count increased to around 7200, index remains unoptimized
Hi All, I'm running SOLR 4.0 on a Linux machine with around 30GB RAM. I've 2 cores running under solr as belowCore AA: around 30 GB data , segments count = 30Core BB: around 216 GB data, segments count=300 Solr is running through jetty and I've allocated max of 12GB heap memory through java like: java -Xms4GB -Xmx12GB Note that, I've another Java based application which keeps feeding new data to SOLR index even while running the following optimizes. I noticed that solr queries were running slow and thought of running an optimize. So from the SOLR web admin, I clicked the optimize button for core AA, and after some time[30-40 mins] I saw that the segments count was reduced to 1, indicating it got optimized. Next, I ran the same thing for the other core BB, the segments count kept increasing and I thought it would be good to shutdown the application which is feeding data to this index, and by the time I closed this application, segment count has reached to a very high value ~7200. And I think it stopped optimizing the indices, because I see 2 red circles next to "optimized" and "Current" and clicking on the Optimize button is not doing anything[Generally the small circle in optimize button keeps moving till the index gets fully optimized, which I didn't notice while optimizing Core BB]. One thing I noticed is the total index size remained at around 202GB after reaching this high segment count unlike the core AA, where during optimization the index size increased to around 59GB and then reduced to 30GB and I saw 2 tick marks next to "optimized" and "Current" . Since this is a production/live machine, I am a bit concerned and don't want to lose any data or end up with corrupt index. Should I just restart SOLR[its running through jetty]? Or any other step? Please advise on what's the right/optimum step which also ensures that the core BB gets optimized and I don't lose any data or index corruption occurs. I am bit worried, please help. Thanks in advance. Find attached the screenshot from SOLR admin pages for both Core AA and core BB, showing the segments count & index size. Thanks again,DK
Store 2 dimensional array( of int values) in solr 4.0
hi All, I'm trying to store a 2 dimensional array in SOLR [version 4.0]. Basically I've the following data: [[20121108, 1],[20121110, 7],[2012, 2],[20121112, 2]] ... The inner array being used to keep some count say X for that particular day. Currently, I'm using the following field to store this data: and I'm using python library pySolr to store the data. Currently the data that gets stored looks like this(its array of strings) [20121108, 1][20121110, 7][2012, 2][20121112, 2][20121113, 2][20121116, 1] Is there a way, i can store the 2 dimensional array and the inner array can contain int values, like the one shown in the beginning example, such that the the final/stored data in SOLR looks something like: 20121108 7 20121110 12 20121110 12 Just a guess, I think for this case, we need to add one more field[the index for instance], for each inner array which will again be multivalued (which will store int values only)? How do I add the actual 2 dimensional array, how to pass the inner arrays and how to store the full doc that contains this 2 dimensional array. Please help me out sort this issue. Please share your views and point me in the right direction. Any help would be highly appreciated. I found similar things on the web, but not the one I'm looking for: http://lucene.472066.n3.nabble.com/Two-dimensional-array-in-Solr-schema-td4003309.html Thanks
RE: Store 2 dimensional array( of int values) in solr 4.0
Hi,Thanks for the quick reply. Sure, please find below the details as per your query. Essentially, I want to retrieve the doc through JSON [using JSON format as SOLR result output]and want JSON to pick the the data from the dataX field as a two dimensional array of ints. When I store the data as show below, it shows up in JSON array of strings where the internal array is basically shown as strings (because thats how the field is configured and I'm storing, not finding any other option). Following is the current JSON output that I'm able to fetch: "dataX":["[20130614, 2]","[20130615, 11]","[20130616, 1]","[20130617, 1]","[20130619, 8]","[20130620, 5]","[20130623, 5]"] whereas I want to fetch the dataX as something like: "dataX":[[20130614, 2],[20130615, 11],[20130616, 1],[20130617, 1],[20130619, 8],[20130620, 5],[20130623, 5]] as can be seen, the dataX is essentially a 2D array where the internal array is of two ints, one being date and other being the count. Please point me in the right direction. Appreciate your time. Thanks. > From: j...@basetechnology.com > To: solr-user@lucene.apache.org > Subject: Re: Store 2 dimensional array( of int values) in solr 4.0 > Date: Fri, 6 Sep 2013 08:44:06 -0400 > > First you need to tell us how you wish to use and query the data. That will > largely determine how the data must be stored. Give us a few example queries > of how you would like your application to be able to access the data. > > Note that Lucene has only simple multivalued fields - no structure or > nesting within a single field other that a list of scalar values. > > But you can always store a complex structure as a BSON blob or JSON string > if all you want is to store and retrieve it in its entirety without querying > its internal structure. And note that Lucene queries are field level - does > a field contain or match a scalar value. > > -- Jack Krupansky > > -Original Message- > From: A Geek > Sent: Friday, September 06, 2013 7:10 AM > To: solr user > Subject: Store 2 dimensional array( of int values) in solr 4.0 > > hi All, I'm trying to store a 2 dimensional array in SOLR [version 4.0]. > Basically I've the following data: > [[20121108, 1],[20121110, 7],[2012, 2],[20121112, 2]] ... > > The inner array being used to keep some count say X for that particular day. > Currently, I'm using the following field to store this data: > multiValued="true"/> > and I'm using python library pySolr to store the data. Currently the data > that gets stored looks like this(its array of strings) > [20121108, 1][20121110, > 7][2012, 2][20121112, 2][20121113, > 2][20121116, 1] > Is there a way, i can store the 2 dimensional array and the inner array can > contain int values, like the one shown in the beginning example, such that > the the final/stored data in SOLR looks something like: > 20121108 7 > 20121110 12 > 20121110 12 > > Just a guess, I think for this case, we need to add one more field[the index > for instance], for each inner array which will again be multivalued (which > will store int values only)? How do I add the actual 2 dimensional array, > how to pass the inner arrays and how to store the full doc that contains > this 2 dimensional array. Please help me out sort this issue. > Please share your views and point me in the right direction. Any help would > be highly appreciated. > I found similar things on the web, but not the one I'm looking for: > http://lucene.472066.n3.nabble.com/Two-dimensional-array-in-Solr-schema-td4003309.html > Thanks >
How to optimize live production server SOLR Index
Hi All, I'm pretty new to SOLR. Currently I'm using SOLR 4.0 version and we've two indexes one with size around 30Gig and another with size 180 Gig . Each contains more than a million records. I was wondering what is the best way to optimize the Index, and keep serving to user request and also while the backend indexer keeps adding new documents through SOLR commits. Please share your ideas and opinions, any precaution to be taken while running the optimize etc. Thanks in advance. Regards,DK
How to combine Date range query with negation query
Hi All, I'm trying to run a query against the following fields: and against For majority of the documents the author_location is default i.e. "unset" . I want to run a query where the author_location has got some value other than "unset" and the created_at field is greater a given timestamp. I tried running the following query: -author_location:unset&created_at:[2013-03-10T06:30:21Z TO *]but its not working and its dumping results which contains author_location=unset. I also tried the following using filter query, but it seems the query is not correct or something like that as I'm getting the results that includes author_location=unset documents. Would appreciate if someone could point me to the right query. Please note that, I'm running SOLR: solr-spec 4.0.0.2012.10.06.03.04.33on a Linux machine. Thanks in advance. Regards, DK
how to specify default sort fields in solr schema?
Hi all, Is there a way to specify the default sort fields in the solr schema in 3.6 or 4.0Beta? similar to the default search operator. Like, the default search operator can be set to "OR" and only when you want to override you pass it in the search URL as "AND". I have lots of fields related to the docs I'm indexing and want the result to be sorted on certain set of fields by default. Would appreciate any help in this direction. Thanks in advance. Regards, DK