Also be aware that by default Solr is configured to only index the first 10,000 lines of text. See maxFieldLength in solrconfig.xml
Best Erick On Fri, Oct 21, 2011 at 7:34 PM, Peter Spam <ps...@mac.com> wrote: > Thanks for your note, Anand. What was the maximum chunk size for you? Could > you post the relevant portions of your configuration file? > > > Thanks! > Pete > > On Oct 21, 2011, at 4:20 AM, anand.ni...@rbs.com wrote: > >> Hi, >> >> I was also facing the issue of highlighting the large text files. I applied >> the solution proposed here and it worked. But I am getting following error : >> >> >> Basically 'hitGrouped.vm' is not found. I am using solr-3.4.0. Where can I >> get this file from. Its reference is present in browse.vm >> >> <div class="results"> >> #if($response.response.get('grouped')) >> #foreach($grouping in $response.response.get('grouped')) >> #parse("hitGrouped.vm") >> #end >> #else >> #foreach($doc in $response.results) >> #parse("hit.vm") >> #end >> #end >> </div> >> >> >> HTTP Status 500 - Can't find resource 'hitGrouped.vm' in classpath or >> 'C:\caprice\workspace\caprice\dist\DEV\solr\.\conf/', >> cwd=C:\glassfish3\glassfish\domains\domain1\config >> java.lang.RuntimeException: Can't find resource 'hitGrouped.vm' in classpath >> or 'C:\caprice\workspace\caprice\dist\DEV\solr\.\conf/', >> cwd=C:\glassfish3\glassfish\domains\domain1\config at >> org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:268) >> at >> org.apache.solr.response.SolrVelocityResourceLoader.getResourceStream(SolrVelocityResourceLoader.java:42) >> at org.apache.velocity.Template.process(Template.java:98) at >> org.apache.velocity.runtime.resource.ResourceManagerImpl.loadResource(ResourceManagerImpl.java:446) >> at >> >> Thanks & Regards, >> Anand >> Anand Nigam >> RBS Global Banking & Markets >> Office: +91 124 492 5506 >> >> >> -----Original Message----- >> From: karsten-s...@gmx.de [mailto:karsten-s...@gmx.de] >> Sent: 21 October 2011 14:58 >> To: solr-user@lucene.apache.org >> Subject: Re: Can Solr handle large text files? >> >> Hi Peter, >> >> highlighting in large text files can not be fast without dividing the >> original text in small piece. >> So take a look in >> http://xtf.cdlib.org/documentation/under-the-hood/#Chunking >> and in >> http://www.lucidimagination.com/blog/2010/09/16/2446/ >> >> Which means that you should divide your files and use Result Grouping / >> Field Collapsing to list only one hit per original document. >> >> (xtf also would solve your problem "out of the box" but xtf does not use >> solr). >> >> Best regards >> Karsten >> >> -------- Original-Nachricht -------- >>> Datum: Thu, 20 Oct 2011 17:59:04 -0700 >>> Von: Peter Spam <ps...@mac.com> >>> An: solr-user@lucene.apache.org >>> Betreff: Can Solr handle large text files? >> >>> I have about 20k text files, some very small, but some up to 300MB, >>> and would like to do text searching with highlighting. >>> >>> Imagine the text is the contents of your syslog. >>> >>> I would like to type in some terms, such as "error" and "mail", and >>> have Solr return the syslog lines with those terms PLUS two lines of >>> context. >>> Pretty much just like Google's highlighting. >>> >>> 1) Can Solr handle this? I had extremely long query times when I >>> tried this with Solr 1.4.1 (yes I was using TermVectors, etc.). I >>> tried breaking the files into 1MB pieces, but searching would be wonky >>> => return the wrong number of documents (ie. if one file had a term 5 >>> times, and that was the only file that had the term, I want 1 result, not 5 >>> results). >>> >>> 2) What sort of tokenizer would be best? Here's what I'm using: >>> >>> <field name="body" type="text_pl" indexed="true" stored="true" >>> multiValued="false" termVectors="true" termPositions="true" >>> termOffsets="true" /> >>> >>> <fieldType name="text_pl" class="solr.TextField"> >>> <analyzer> >>> <tokenizer class="solr.StandardTokenizerFactory"/> >>> <filter class="solr.LowerCaseFilterFactory"/> >>> <filter class="solr.WordDelimiterFilterFactory" >>> generateWordParts="0" generateNumberParts="0" catenateWords="0" >>> catenateNumbers="0" >>> catenateAll="0" splitOnCaseChange="0"/> >>> </analyzer> >>> </fieldType> >>> >>> >>> Thanks! >>> Pete >> >> *********************************************************************************** >> The Royal Bank of Scotland plc. Registered in Scotland No 90312. >> Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB. >> Authorised and regulated by the Financial Services Authority. The >> Royal Bank of Scotland N.V. is authorised and regulated by the >> De Nederlandsche Bank and has its seat at Amsterdam, the >> Netherlands, and is registered in the Commercial Register under >> number 33002587. Registered Office: Gustav Mahlerlaan 350, >> Amsterdam, The Netherlands. The Royal Bank of Scotland N.V. and >> The Royal Bank of Scotland plc are authorised to act as agent for each >> other in certain jurisdictions. >> >> This e-mail message is confidential and for use by the addressee only. >> If the message is received by anyone other than the addressee, please >> return the message to the sender by replying to it and then delete the >> message from your computer. Internet e-mails are not necessarily >> secure. The Royal Bank of Scotland plc and The Royal Bank of Scotland >> N.V. including its affiliates ("RBS group") does not accept responsibility >> for changes made to this message after it was sent. For the protection >> of RBS group and its clients and customers, and in compliance with >> regulatory requirements, the contents of both incoming and outgoing >> e-mail communications, which could include proprietary information and >> Non-Public Personal Information, may be read by authorised persons >> within RBS group other than the intended recipient(s). >> >> Whilst all reasonable care has been taken to avoid the transmission of >> viruses, it is the responsibility of the recipient to ensure that the onward >> transmission, opening or use of this message and any attachments will >> not adversely affect its systems or data. No responsibility is accepted >> by the RBS group in this regard and the recipient should carry out such >> virus and other checks as it considers appropriate. >> >> Visit our website at www.rbs.com >> >> *********************************************************************************** >> > >