Your previous mail did not sent to mail list, I am forwarding.

---------- Forwarded message ----------
From: Vineet Mishra <clearmido...@gmail.com>
Date: 2014-05-06 14:33 GMT+03:00
Subject: Re: Indexing Big Data With or Without Solr
To: Furkan KAMACI <furkankam...@gmail.com>


Hi Furkan,

No not the metadata but I am planning to store sensor data to it fyi,
http://www.freescale.com/webapp/sps/site/overview.jsp?code=SD_DATAFILEFORMAT
this
is how sensor data will look like, moreover you can think of # of Columns
to be extended to 200 and # of rows to be around 1 Lakh, like this I will
be having around 1 Lakhs different Files.

Hardware spec. 6 Xeon Processor Machine 2.13 GHz, 16GB, HDD - Can extend,
No issue.

Thanks!



On Tue, May 6, 2014 at 4:31 PM, Furkan KAMACI <furkankam...@gmail.com>wrote:

> Hi Vineet;
>
> I remove such kind of HTML tags and stop words (high frequency terms are
> removed). However sensor data and web data has different characteristics,
> you are right. Could you tell me what kind of information do you store at
> each data (geolocation, name, description etc. etc.)? On the other hand
> could you tell me more about your hardware infrastructure?
>
> Thanks;
> Furkan KAMACI
>
>
> 2014-05-06 13:40 GMT+03:00 Vineet Mishra <clearmido...@gmail.com>:
>
> Hi Furkan,
>>
>> Indexing the document and indexing the raw digital sensor data is
>> completely different, for your case a web document will have more of
>> repeated tokens, like where you are having web pages then its quite obvious
>> to have more of a repeating words like *div, span, title, style, etc. *This
>> will be ideal case for Solr, as it gives the benefit of Inverted Index for
>> your case but if you closely go through my requirement I have sensor data,
>> moreover the data will be huge in size and hardly there are chances of
>> repetition.
>>
>> What do you say, how will it be suitable?
>>
>> [Open Question - Expert advise needed]
>>
>> Thanks!
>>
>>
>>
>> On Tue, Apr 29, 2014 at 1:41 PM, Furkan KAMACI <furkankam...@gmail.com>wrote:
>>
>>> Hi Vineet;
>>>
>>> Many millions of documents (web pages) that has an average response time
>>> less than 10 ms.
>>>
>>> Thanks;
>>> Furkan KAMACI
>>>
>>>
>>> 2014-04-29 10:55 GMT+03:00 Vineet Mishra <clearmido...@gmail.com>:
>>>
>>> Hi Furkan,
>>>>
>>>> Can you specify what type and size of data are you having?
>>>> Moreover what is your index size and query response time.
>>>>
>>>> Thanks
>>>> Vineet
>>>>
>>>> ---------- Forwarded message ----------
>>>> From: Furkan KAMACI <furkankam...@gmail.com>
>>>> Date: Tue, Apr 15, 2014 at 7:53 PM
>>>> Subject: Re: Indexing Big Data With or Without Solr
>>>> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
>>>>
>>>>
>>>> Hi Vineet;
>>>>
>>>> I've been using SolrCloud for such kind of Big Data and I think that you
>>>> should consider to use it. If you have any problems you can ask it here.
>>>>
>>>> Thanks;
>>>> Furkan KAMACI
>>>>
>>>>
>>>> 2014-04-15 13:20 GMT+03:00 Vineet Mishra <clearmido...@gmail.com>:
>>>>
>>>> > Hi All,
>>>> >
>>>> > I have worked with Solr 3.5 to implement real time search on some
>>>> 100GB
>>>> > data, that worked fine but was little slow on complex queries(Multiple
>>>> > group/joined queries).
>>>> > But now I want to index some real Big Data(around 4 TB or even more),
>>>> can
>>>> > SolrCloud be solution for it if not what could be the best possible
>>>> > solution in this case.
>>>> >
>>>> > *Stats for the previous Implementation:*
>>>> > It was Master Slave Architecture with normal Standalone multiple
>>>> instance
>>>> > of Solr 3.5. There were around 12 Solr instance running on different
>>>> > machines.
>>>> >
>>>> > *Things to consider for the next implementation:*
>>>> > Since all the data is sensor data hence it is the factor of duplicity
>>>> and
>>>> > uniqueness.
>>>> >
>>>> > *Really urgent, please take the call on priority with set of feasible
>>>> > solution.*
>>>> >
>>>> > Regards
>>>> >
>>>>
>>>>
>>>
>>
>

Reply via email to