Also be aware that by default Solr is configured to only index the
first 10,000 lines
of text. See maxFieldLength in solrconfig.xml
Best
Erick

On Fri, Oct 21, 2011 at 7:34 PM, Peter Spam <ps...@mac.com> wrote:
> Thanks for your note, Anand.  What was the maximum chunk size for you?  Could 
> you post the relevant portions of your configuration file?
>
>
> Thanks!
> Pete
>
> On Oct 21, 2011, at 4:20 AM, anand.ni...@rbs.com wrote:
>
>> Hi,
>>
>> I was also facing the issue of highlighting the large text files. I applied 
>> the solution proposed here and it worked. But I am getting following error :
>>
>>
>> Basically 'hitGrouped.vm' is not found. I am using solr-3.4.0. Where can I 
>> get this file from. Its reference is present in browse.vm
>>
>> <div class="results">
>>  #if($response.response.get('grouped'))
>>    #foreach($grouping in $response.response.get('grouped'))
>>      #parse("hitGrouped.vm")
>>    #end
>>  #else
>>    #foreach($doc in $response.results)
>>      #parse("hit.vm")
>>    #end
>>  #end
>> </div>
>>
>>
>> HTTP Status 500 - Can't find resource 'hitGrouped.vm' in classpath or 
>> 'C:\caprice\workspace\caprice\dist\DEV\solr\.\conf/', 
>> cwd=C:\glassfish3\glassfish\domains\domain1\config 
>> java.lang.RuntimeException: Can't find resource 'hitGrouped.vm' in classpath 
>> or 'C:\caprice\workspace\caprice\dist\DEV\solr\.\conf/', 
>> cwd=C:\glassfish3\glassfish\domains\domain1\config at 
>> org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:268)
>>  at 
>> org.apache.solr.response.SolrVelocityResourceLoader.getResourceStream(SolrVelocityResourceLoader.java:42)
>>  at org.apache.velocity.Template.process(Template.java:98) at 
>> org.apache.velocity.runtime.resource.ResourceManagerImpl.loadResource(ResourceManagerImpl.java:446)
>>  at
>>
>> Thanks & Regards,
>> Anand
>> Anand Nigam
>> RBS Global Banking & Markets
>> Office: +91 124 492 5506
>>
>>
>> -----Original Message-----
>> From: karsten-s...@gmx.de [mailto:karsten-s...@gmx.de]
>> Sent: 21 October 2011 14:58
>> To: solr-user@lucene.apache.org
>> Subject: Re: Can Solr handle large text files?
>>
>> Hi Peter,
>>
>> highlighting in large text files can not be fast without dividing the 
>> original text in small piece.
>> So take a look in
>> http://xtf.cdlib.org/documentation/under-the-hood/#Chunking
>> and in
>> http://www.lucidimagination.com/blog/2010/09/16/2446/
>>
>> Which means that you should divide your files and use Result Grouping / 
>> Field Collapsing to list only one hit per original document.
>>
>> (xtf also would solve your problem "out of the box" but xtf does not use 
>> solr).
>>
>> Best regards
>>  Karsten
>>
>> -------- Original-Nachricht --------
>>> Datum: Thu, 20 Oct 2011 17:59:04 -0700
>>> Von: Peter Spam <ps...@mac.com>
>>> An: solr-user@lucene.apache.org
>>> Betreff: Can Solr handle large text files?
>>
>>> I have about 20k text files, some very small, but some up to 300MB,
>>> and would like to do text searching with highlighting.
>>>
>>> Imagine the text is the contents of your syslog.
>>>
>>> I would like to type in some terms, such as "error" and "mail", and
>>> have Solr return the syslog lines with those terms PLUS two lines of 
>>> context.
>>> Pretty much just like Google's highlighting.
>>>
>>> 1) Can Solr handle this?  I had extremely long query times when I
>>> tried this with Solr 1.4.1 (yes I was using TermVectors, etc.).  I
>>> tried breaking the files into 1MB pieces, but searching would be wonky
>>> => return the wrong number of documents (ie. if one file had a term 5
>>> times, and that was the only file that had the term, I want 1 result, not 5 
>>> results).
>>>
>>> 2) What sort of tokenizer would be best?  Here's what I'm using:
>>>
>>>   <field name="body" type="text_pl" indexed="true" stored="true"
>>> multiValued="false" termVectors="true" termPositions="true"
>>> termOffsets="true" />
>>>
>>>    <fieldType name="text_pl" class="solr.TextField">
>>>      <analyzer>
>>>        <tokenizer class="solr.StandardTokenizerFactory"/>
>>>        <filter class="solr.LowerCaseFilterFactory"/>
>>>        <filter class="solr.WordDelimiterFilterFactory"
>>> generateWordParts="0" generateNumberParts="0" catenateWords="0" 
>>> catenateNumbers="0"
>>> catenateAll="0" splitOnCaseChange="0"/>
>>>      </analyzer>
>>>    </fieldType>
>>>
>>>
>>> Thanks!
>>> Pete
>>
>> ***********************************************************************************
>> The Royal Bank of Scotland plc. Registered in Scotland No 90312.
>> Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB.
>> Authorised and regulated by the Financial Services Authority. The
>> Royal Bank of Scotland N.V. is authorised and regulated by the
>> De Nederlandsche Bank and has its seat at Amsterdam, the
>> Netherlands, and is registered in the Commercial Register under
>> number 33002587. Registered Office: Gustav Mahlerlaan 350,
>> Amsterdam, The Netherlands. The Royal Bank of Scotland N.V. and
>> The Royal Bank of Scotland plc are authorised to act as agent for each
>> other in certain jurisdictions.
>>
>> This e-mail message is confidential and for use by the addressee only.
>> If the message is received by anyone other than the addressee, please
>> return the message to the sender by replying to it and then delete the
>> message from your computer. Internet e-mails are not necessarily
>> secure. The Royal Bank of Scotland plc and The Royal Bank of Scotland
>> N.V. including its affiliates ("RBS group") does not accept responsibility
>> for changes made to this message after it was sent. For the protection
>> of RBS group and its clients and customers, and in compliance with
>> regulatory requirements, the contents of both incoming and outgoing
>> e-mail communications, which could include proprietary information and
>> Non-Public Personal Information, may be read by authorised persons
>> within RBS group other than the intended recipient(s).
>>
>> Whilst all reasonable care has been taken to avoid the transmission of
>> viruses, it is the responsibility of the recipient to ensure that the onward
>> transmission, opening or use of this message and any attachments will
>> not adversely affect its systems or data. No responsibility is accepted
>> by the RBS group in this regard and the recipient should carry out such
>> virus and other checks as it considers appropriate.
>>
>> Visit our website at www.rbs.com
>>
>> ***********************************************************************************
>>
>
>

Reply via email to