Your previous mail did not sent to mail list, I am forwarding. ---------- Forwarded message ---------- From: Vineet Mishra <clearmido...@gmail.com> Date: 2014-05-06 14:33 GMT+03:00 Subject: Re: Indexing Big Data With or Without Solr To: Furkan KAMACI <furkankam...@gmail.com>
Hi Furkan, No not the metadata but I am planning to store sensor data to it fyi, http://www.freescale.com/webapp/sps/site/overview.jsp?code=SD_DATAFILEFORMAT this is how sensor data will look like, moreover you can think of # of Columns to be extended to 200 and # of rows to be around 1 Lakh, like this I will be having around 1 Lakhs different Files. Hardware spec. 6 Xeon Processor Machine 2.13 GHz, 16GB, HDD - Can extend, No issue. Thanks! On Tue, May 6, 2014 at 4:31 PM, Furkan KAMACI <furkankam...@gmail.com>wrote: > Hi Vineet; > > I remove such kind of HTML tags and stop words (high frequency terms are > removed). However sensor data and web data has different characteristics, > you are right. Could you tell me what kind of information do you store at > each data (geolocation, name, description etc. etc.)? On the other hand > could you tell me more about your hardware infrastructure? > > Thanks; > Furkan KAMACI > > > 2014-05-06 13:40 GMT+03:00 Vineet Mishra <clearmido...@gmail.com>: > > Hi Furkan, >> >> Indexing the document and indexing the raw digital sensor data is >> completely different, for your case a web document will have more of >> repeated tokens, like where you are having web pages then its quite obvious >> to have more of a repeating words like *div, span, title, style, etc. *This >> will be ideal case for Solr, as it gives the benefit of Inverted Index for >> your case but if you closely go through my requirement I have sensor data, >> moreover the data will be huge in size and hardly there are chances of >> repetition. >> >> What do you say, how will it be suitable? >> >> [Open Question - Expert advise needed] >> >> Thanks! >> >> >> >> On Tue, Apr 29, 2014 at 1:41 PM, Furkan KAMACI <furkankam...@gmail.com>wrote: >> >>> Hi Vineet; >>> >>> Many millions of documents (web pages) that has an average response time >>> less than 10 ms. >>> >>> Thanks; >>> Furkan KAMACI >>> >>> >>> 2014-04-29 10:55 GMT+03:00 Vineet Mishra <clearmido...@gmail.com>: >>> >>> Hi Furkan, >>>> >>>> Can you specify what type and size of data are you having? >>>> Moreover what is your index size and query response time. >>>> >>>> Thanks >>>> Vineet >>>> >>>> ---------- Forwarded message ---------- >>>> From: Furkan KAMACI <furkankam...@gmail.com> >>>> Date: Tue, Apr 15, 2014 at 7:53 PM >>>> Subject: Re: Indexing Big Data With or Without Solr >>>> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> >>>> >>>> >>>> Hi Vineet; >>>> >>>> I've been using SolrCloud for such kind of Big Data and I think that you >>>> should consider to use it. If you have any problems you can ask it here. >>>> >>>> Thanks; >>>> Furkan KAMACI >>>> >>>> >>>> 2014-04-15 13:20 GMT+03:00 Vineet Mishra <clearmido...@gmail.com>: >>>> >>>> > Hi All, >>>> > >>>> > I have worked with Solr 3.5 to implement real time search on some >>>> 100GB >>>> > data, that worked fine but was little slow on complex queries(Multiple >>>> > group/joined queries). >>>> > But now I want to index some real Big Data(around 4 TB or even more), >>>> can >>>> > SolrCloud be solution for it if not what could be the best possible >>>> > solution in this case. >>>> > >>>> > *Stats for the previous Implementation:* >>>> > It was Master Slave Architecture with normal Standalone multiple >>>> instance >>>> > of Solr 3.5. There were around 12 Solr instance running on different >>>> > machines. >>>> > >>>> > *Things to consider for the next implementation:* >>>> > Since all the data is sensor data hence it is the factor of duplicity >>>> and >>>> > uniqueness. >>>> > >>>> > *Really urgent, please take the call on priority with set of feasible >>>> > solution.* >>>> > >>>> > Regards >>>> > >>>> >>>> >>> >> >