How to show just the parent domains from results in Solr

2013-07-02 Thread A Geek
hi All, I've indexed documents in my Solr 4.0 index, with fields like URL, 
page_content etc. Now when I run a search query, against the page_content I get 
a lot of urls . And say, if I in total 15 URL domains, and under these 15 
domains I've all the pages indexed in SOLR.  Is there a way in which, I can 
just get the parent URLs for search results instead of getting all the urls. 
For example, say searching for "abc" returns:
 www.aa.com/11.html www.aa.com/12.htmlwww.aa.com/13.html 
www.bb.com/15.htmlwww.bb.com/18.html
I want the results to be like this:www.aa.comwww.bb.com
Is there a way in SOLR, through which I can achieve this. I've tried 
FieldCollapsing[ https://wiki.apache.org/solr/FieldCollapsing ] but either its 
not the right solution or I'm not able to use it properly. Could someone help 
me find the solution to the above problem. Thanks in advance. 
Regards, KK

  

Segments count increased to around 7200, index remains unoptimized

2013-07-12 Thread A Geek
Hi All, I'm running SOLR 4.0 on a Linux machine with around 30GB RAM. I've 2 
cores running under solr as belowCore AA: around 30 GB data , segments count = 
30Core BB: around 216 GB data, segments count=300
Solr is running through jetty and I've allocated max of 12GB heap memory 
through java like: java -Xms4GB -Xmx12GB
Note that, I've another Java based application which keeps feeding new data to 
SOLR index even while running the following optimizes. I noticed that solr 
queries were running slow and thought of running an optimize. So from the SOLR 
web admin, I clicked the optimize button for core AA, and after some time[30-40 
mins] I saw that the segments count was reduced to 1, indicating it got 
optimized. Next, I ran the same thing for the other core BB, the segments count 
kept increasing and I thought it would be good to shutdown the application 
which is feeding data to this index, and by the time I closed this application, 
segment count has reached to a very high value ~7200. And I think it stopped 
optimizing the indices, because I see 2 red circles next to "optimized" and 
"Current" and clicking on the Optimize button is not doing anything[Generally 
the small circle in optimize button keeps moving till the index gets fully 
optimized, which I didn't notice while optimizing Core BB]. One thing I noticed 
is the total index size remained at around 202GB after reaching this high 
segment count unlike the core AA, where during optimization the index size 
increased to around 59GB and then reduced to 30GB and I saw 2 tick marks next 
to "optimized"  and "Current" . Since this is a production/live machine, I am a 
bit concerned and don't want to lose any data or end up with corrupt index. 
Should I just restart SOLR[its running through jetty]? Or any other step? 
Please advise on what's the right/optimum step which also ensures that the core 
BB gets optimized and I don't lose any data or index corruption occurs. I am 
bit worried, please help. Thanks in advance.
Find attached the screenshot from SOLR admin pages for both Core AA and core 
BB, showing the segments count & index size.
Thanks again,DK   

Store 2 dimensional array( of int values) in solr 4.0

2013-09-06 Thread A Geek
hi All, I'm trying to store a 2 dimensional array in SOLR [version 4.0]. 
Basically I've the following data: 
[[20121108, 1],[20121110, 7],[2012, 2],[20121112, 2]] ...

The inner array being used to keep some count say X for that particular day. 
Currently, I'm using the following field to store this data: 

and I'm using python library pySolr to store the data. Currently the data that 
gets stored looks like this(its array of strings)
[20121108, 1][20121110, 
7][2012, 2][20121112, 2][20121113, 
2][20121116, 1]
Is there a way, i can store the 2 dimensional array and the inner array can 
contain int values, like the one shown in the beginning example, such that the 
the final/stored data in SOLR looks something like: 
20121108  7  
 20121110 12 
 20121110 12 

Just a guess, I think for this case, we need to add one more field[the index 
for instance], for each inner array which will again be multivalued (which will 
store int values only)? How do I add the actual 2 dimensional array, how to 
pass the inner arrays and how to store the full doc that contains this 2 
dimensional array. Please help me out sort this issue.
Please share your views and point me in the right direction. Any help would be 
highly appreciated. 
I found similar things on the web, but not the one I'm looking for: 
http://lucene.472066.n3.nabble.com/Two-dimensional-array-in-Solr-schema-td4003309.html
Thanks

RE: Store 2 dimensional array( of int values) in solr 4.0

2013-09-06 Thread A Geek
Hi,Thanks for the quick reply. Sure, please find below the details as per your 
query.
Essentially, I want to retrieve the doc through JSON [using JSON format as SOLR 
result output]and want JSON to pick the the data from the dataX field as a two 
dimensional array of ints. When I store the data as show below, it shows up in 
JSON array of strings where the internal array is basically shown as strings 
(because thats how the field is configured and I'm storing, not finding any 
other option). Following is the current JSON output that I'm able to fetch: 
"dataX":["[20130614, 2]","[20130615, 11]","[20130616, 1]","[20130617, 
1]","[20130619, 8]","[20130620, 5]","[20130623, 5]"]
whereas I want  to fetch the dataX as something like: 
"dataX":[[20130614, 2],[20130615, 11],[20130616, 1],[20130617, 1],[20130619, 
8],[20130620, 5],[20130623, 5]]
as can be seen, the dataX is essentially a 2D array where the internal array is 
of two ints, one being date and other being the count.
Please point me in the right direction. Appreciate your time.
Thanks.

> From: j...@basetechnology.com
> To: solr-user@lucene.apache.org
> Subject: Re: Store 2 dimensional array( of int values) in solr 4.0
> Date: Fri, 6 Sep 2013 08:44:06 -0400
> 
> First you need to tell us how you wish to use and query the data. That will 
> largely determine how the data must be stored. Give us a few example queries 
> of how you would like your application to be able to access the data.
> 
> Note that Lucene has only simple multivalued fields - no structure or 
> nesting within a single field other that a list of scalar values.
> 
> But you can always store a complex structure as a BSON blob or JSON string 
> if all you want is to store and retrieve it in its entirety without querying 
> its internal structure. And note that Lucene queries are field level - does 
> a field contain or match a scalar value.
> 
> -- Jack Krupansky
> 
> -Original Message- 
> From: A Geek
> Sent: Friday, September 06, 2013 7:10 AM
> To: solr user
> Subject: Store 2 dimensional array( of int values) in solr 4.0
> 
> hi All, I'm trying to store a 2 dimensional array in SOLR [version 4.0]. 
> Basically I've the following data:
> [[20121108, 1],[20121110, 7],[2012, 2],[20121112, 2]] ...
> 
> The inner array being used to keep some count say X for that particular day. 
> Currently, I'm using the following field to store this data:
>  multiValued="true"/>
> and I'm using python library pySolr to store the data. Currently the data 
> that gets stored looks like this(its array of strings)
> [20121108, 1][20121110, 
> 7][2012, 2][20121112, 2][20121113, 
> 2][20121116, 1]
> Is there a way, i can store the 2 dimensional array and the inner array can 
> contain int values, like the one shown in the beginning example, such that 
> the the final/stored data in SOLR looks something like: 
> 20121108  7  
>  20121110 12 
>  20121110 12 
> 
> Just a guess, I think for this case, we need to add one more field[the index 
> for instance], for each inner array which will again be multivalued (which 
> will store int values only)? How do I add the actual 2 dimensional array, 
> how to pass the inner arrays and how to store the full doc that contains 
> this 2 dimensional array. Please help me out sort this issue.
> Please share your views and point me in the right direction. Any help would 
> be highly appreciated.
> I found similar things on the web, but not the one I'm looking for: 
> http://lucene.472066.n3.nabble.com/Two-dimensional-array-in-Solr-schema-td4003309.html
> Thanks 
> 
  

How to optimize live production server SOLR Index

2013-03-30 Thread A Geek
Hi All, I'm pretty new to SOLR. Currently I'm using SOLR 4.0 version and we've 
two indexes one with size around 30Gig and another with size 180 Gig . Each 
contains more than a million records. I was wondering what is the best way to 
optimize the Index, and keep serving to user request and also while the backend 
indexer keeps adding new documents through SOLR commits. Please share your 
ideas and opinions, any precaution to be taken while running the optimize etc. 

Thanks in advance.
Regards,DK

  

How to combine Date range query with negation query

2013-03-10 Thread A Geek

Hi All, I'm trying to run a query against the following fields:  and   

against For majority of the documents the author_location is default i.e. 
"unset" . I want to run a query where the author_location has got some value 
other than "unset" and the created_at field is greater a given timestamp. I 
tried running the following query: -author_location:unset&created_at:[2013-03-10T06:30:21Z TO *]but 
its not working and its dumping results which contains author_location=unset. I 
also tried the following using filter query, but it seems the query is not 
correct or something like that as I'm getting the results that includes 
author_location=unset documents.
Would appreciate if someone could point me to the right query. Please note 
that, I'm running SOLR: solr-spec 4.0.0.2012.10.06.03.04.33on a Linux machine.
Thanks in advance. 
Regards, DK

  

how to specify default sort fields in solr schema?

2012-10-03 Thread A Geek

Hi all, Is there a way to specify the default sort fields in the solr schema in 
3.6 or 4.0Beta? similar to the default search operator. Like, the default 
search operator can be set to "OR" and only when you want to override you pass 
it in the search URL as "AND". I have lots of fields related to the docs I'm 
indexing and want the result to be sorted on certain set of fields by default. 
Would appreciate any help in this direction. Thanks in advance. 
Regards, DK