OutOfMemoryError coming from TermVectorsReader

2011-09-18 Thread Anand.Nigam
Hi,

I am new to solr. I an trying to index text documents of large size. On 
searching from indexed documents I am getting following OutOfMemoryError. 
Please help me in resolving this issue.

The field which stores file content is configured in schema.xml as below:




and Highlighting is configured as below:


on

${all.fields.list}

500

true



2011-09-16 09:38:45.763 [http-thread-pool-9091(5)] ERROR - 
java.lang.OutOfMemoryError: Java heap space
at 
org.apache.lucene.index.TermVectorsReader.readTermVector(TermVectorsReader.java:503)
at 
org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:263)
at 
org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:284)
at 
org.apache.lucene.index.SegmentReader.getTermFreqVector(SegmentReader.java:759)
at 
org.apache.lucene.index.DirectoryReader.getTermFreqVector(DirectoryReader.java:510)
at 
org.apache.solr.search.SolrIndexReader.getTermFreqVector(SolrIndexReader.java:234)
at 
org.apache.lucene.search.vectorhighlight.FieldTermStack.(FieldTermStack.java:83)
at 
org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getFieldFragList(FastVectorHighlighter.java:175)
at 
org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getBestFragments(FastVectorHighlighter.java:166)
at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByFastVectorHighlighter(DefaultSolrHighlighter.java:509)
at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:376)
at 
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:116)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:256)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:215)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:279)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:655)
at 
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:595)
at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:98)
at 
com.sun.enterprise.web.PESessionLockingStandardPipeline.invoke(PESessionLockingStandardPipeline.java:91)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:162)
at 
org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:326)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:227)
at 
com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:170)
at 
com.sun.grizzly.http.ProcessorTask.invokeAdapter(ProcessorTask.java:822)
at com.sun.grizzly.http.ProcessorTask.doProcess(ProcessorTask.java:719)
at com.sun.grizzly.http.ProcessorTask.process(ProcessorTask.java:1013)

Thanks & Regards
Anand Nigam
Developer


***
 
The Royal Bank of Scotland plc. Registered in Scotland No 90312. 
Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB. 
Authorised and regulated by the Financial Services Authority. The 
Royal Bank of Scotland N.V. is authorised and regulated by the 
De Nederlandsche Bank and has its seat at Amsterdam, the 
Netherlands, and is registered in the Commercial Register under 
number 33002587. Registered Office: Gustav Mahlerlaan 350, 
Amsterdam, The Netherlands. The Royal Bank of Scotland N.V. and 
The Royal Bank of Scotland plc are authorised to act as agent for each 
other in certain jurisdictions. 
  
This e-mail message is confidential and for use by the addressee only. 
If the message is received by anyone other than the addressee, please 
return the message to the sender by replying to it and then delete the 
message from your computer. Internet e-mails are not necessarily 
secure. The Royal Bank of Scotland plc and The Royal Bank of Scotland 
N.V. including its affiliates ("RBS group") does not accept responsibility 
for changes made to this message after it was sent. For the protection
of RBS group and its clients and customers, and in compliance with
regulatory requirements, the contents of both incoming and outgoing
e-mail communications, which could include proprietary information and
Non-Public Personal Info

RE: OutOfMemoryError coming from TermVectorsReader

2011-09-22 Thread Anand.Nigam
Hi,

I am trying to index application log files and some database tables. Size of 
the log files range from 1 MB to 100 MB. Database tables also have few 
thousands of rows.

I have used termvector highlighter for the content of the log files as 
mentioned below:

Heap size : 10 GB 
OS: Linux, 64 bit
Solr version : 3.4.0

Thanks & Regards
Anand



Anand Nigam
RBS Global Banking & Markets
Office: +91 124 492 5506   

-Original Message-
From: Glen Newton [mailto:glen.new...@gmail.com] 
Sent: 19 September 2011 16:52
To: solr-user@lucene.apache.org
Subject: Re: OutOfMemoryError coming from TermVectorsReader

Please include information about your heap size, (and other Java command line 
arguments) as well a platform OS (version, swap size, etc), Java version, 
underlying hardware (RAM, etc) for us to better help you.

>From the information you have given, increasing your heap size should help.

Thanks,
Glen

http://zzzoot.blogspot.com/


On Mon, Sep 19, 2011 at 1:34 AM,   wrote:
> Hi,
>
> I am new to solr. I an trying to index text documents of large size. On 
> searching from indexed documents I am getting following OutOfMemoryError. 
> Please help me in resolving this issue.
>
> The field which stores file content is configured in schema.xml as below:
>
>
>  omitNorms="true" termVectors="true" termPositions="true" 
> termOffsets="true" />
>
> and Highlighting is configured as below:
>
>
> on
>
> ${all.fields.list}
>
> 500
>
> true
>
>
>
> 2011-09-16 09:38:45.763 [http-thread-pool-9091(5)] ERROR - 
> java.lang.OutOfMemoryError: Java heap space
>        at 
> org.apache.lucene.index.TermVectorsReader.readTermVector(TermVectorsRe
> ader.java:503)
>        at 
> org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:2
> 63)
>        at 
> org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:2
> 84)
>        at 
> org.apache.lucene.index.SegmentReader.getTermFreqVector(SegmentReader.
> java:759)
>        at 
> org.apache.lucene.index.DirectoryReader.getTermFreqVector(DirectoryRea
> der.java:510)
>        at 
> org.apache.solr.search.SolrIndexReader.getTermFreqVector(SolrIndexRead
> er.java:234)
>        at 
> org.apache.lucene.search.vectorhighlight.FieldTermStack.(FieldTe
> rmStack.java:83)
>        at 
> org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getFiel
> dFragList(FastVectorHighlighter.java:175)
>        at 
> org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getBest
> Fragments(FastVectorHighlighter.java:166)
>        at 
> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByFastV
> ectorHighlighter(DefaultSolrHighlighter.java:509)
>        at 
> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(Defaul
> tSolrHighlighter.java:376)
>        at 
> org.apache.solr.handler.component.HighlightComponent.process(Highlight
> Component.java:116)
>        at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(Sear
> chHandler.java:194)
>        at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
> rBase.java:129)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
> java:356)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> .java:252)
>        at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
> cationFilterChain.java:256)
>        at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
> lterChain.java:215)
>        at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
> lve.java:279)
>        at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa
> lve.java:175)
>        at 
> org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.ja
> va:655)
>        at 
> org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java
> :595)
>        at 
> com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:98)
>        at 
> com.sun.enterprise.web.PESessionLockingStandardPipeline.invoke(PESessi
> onLockingStandardPipeline.java:91)
>        at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja
> va:162)
>        at 
> org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.ja
> va:326)
>        at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
> :227)
>        at 
> com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerM
> apper.java:170)
>        at 
> com.sun.grizzly.http.ProcessorTask.invokeAdapter(ProcessorTask.java:82
> 2)
>        at 
> com.sun.grizzly.http.ProcessorTask.doProcess(ProcessorTask.java:719)
>        at 
> com.sun.grizzly.http.ProcessorTask.process(ProcessorTask.java:1013)
>
> Thanks & Regards
> Anand Nigam
> Developer
>
>
> **
> * The Royal Bank of Scotland plc.

RE: OutOfMemoryError coming from TermVectorsReader

2011-09-23 Thread Anand.Nigam
Thanks Otis,

I am able to show the results such that the last match (500 characters around 
the match) in the log file is shown highlighted. I can try creating multiple 
documents from one log file to see if it improves the performance.

Can anything else be done to reduce the heap size?

Anand Nigam
RBS Global Banking & Markets
Office: +91 124 492 5506   

-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: 23 September 2011 09:35
To: solr-user@lucene.apache.org
Subject: Re: OutOfMemoryError coming from TermVectorsReader

Anand,

But do you really want the whole log file to be a single Solr document (from a 
cursory look at the thread it seems that is the case).  Why not break up a log 
file into multiple documents? e.g. each log message could be one Solr document. 
 Not only will that solve your memory issues, but I think it also makes more 
sense if the intention is for a person to do a search and then look at the 
matched log messages - much easier if you point a person to a short log doc 
than a giant ones through which the person then has to do a manual find.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem 
search :: http://search-lucene.com/


- Original Message -
> From: "anand.ni...@rbs.com" 
> To: solr-user@lucene.apache.org
> Cc: 
> Sent: Thursday, September 22, 2011 11:56 PM
> Subject: RE: OutOfMemoryError coming from TermVectorsReader
> 
> Hi,
> 
> I am trying to index application log files and some database tables. 
> Size of the log files range from 1 MB to 100 MB. Database tables also 
> have few thousands of rows.
> 
> I have used termvector highlighter for the content of the log files as 
> mentioned
> below:
> 
> Heap size : 10 GB
> OS: Linux, 64 bit
> Solr version : 3.4.0
> 
> Thanks & Regards
> Anand
> 
> 
> 
> Anand Nigam
> RBS Global Banking & Markets
> Office: +91 124 492 5506
> 
> -Original Message-
> From: Glen Newton [mailto:glen.new...@gmail.com]
> Sent: 19 September 2011 16:52
> To: solr-user@lucene.apache.org
> Subject: Re: OutOfMemoryError coming from TermVectorsReader
> 
> Please include information about your heap size, (and other Java 
> command line
> arguments) as well a platform OS (version, swap size, etc), Java 
> version, underlying hardware (RAM, etc) for us to better help you.
> 
> From the information you have given, increasing your heap size should help.
> 
> Thanks,
> Glen
> 
> http://zzzoot.blogspot.com/
> 
> 
> On Mon, Sep 19, 2011 at 1:34 AM,   wrote:
>>  Hi,
>> 
>>  I am new to solr. I an trying to index text documents of large size. 
>> On
> searching from indexed documents I am getting following 
> OutOfMemoryError. Please help me in resolving this issue.
>> 
>>  The field which stores file content is configured in schema.xml as below:
>> 
>> 
>>   indexed="true" stored="true" 
>>  omitNorms="true" termVectors="true" 
> termPositions="true" 
>>  termOffsets="true" />
>> 
>>  and Highlighting is configured as below:
>> 
>> 
>>  on
>> 
>>  ${all.fields.list}
>> 
>>  500
>> 
>>   name="f.Content.hl.useFastVectorHighlighter">true
>> 
>> 
>> 
>>  2011-09-16 09:38:45.763 [http-thread-pool-9091(5)] ERROR -
>>  java.lang.OutOfMemoryError: Java heap space
>>         at
>>  
>> org.apache.lucene.index.TermVectorsReader.readTermVector(TermVectorsR
>> e
>>  ader.java:503)
>>         at
>>  
>> org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:
>> 2
>>  63)
>>         at
>>  
>> org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:
>> 2
>>  84)
>>         at
>>  org.apache.lucene.index.SegmentReader.getTermFreqVector(SegmentReader.
>>  java:759)
>>         at
>>  
>> org.apache.lucene.index.DirectoryReader.getTermFreqVector(DirectoryRe
>> a
>>  der.java:510)
>>         at
>>  
>> org.apache.solr.search.SolrIndexReader.getTermFreqVector(SolrIndexRea
>> d
>>  er.java:234)
>>         at
>> 
> org.apache.lucene.search.vectorhighlight.FieldTermStack.(FieldTe
>>  rmStack.java:83)
>>         at
>>  
>> org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getFie
>> l
>>  dFragList(FastVectorHighlighter.java:175)
>>         at
>>  
>> org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getBes
>> t
>>  Fragments(FastVectorHighlighter.java:166)
>>         at
>>  
>> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByFast
>> V
>>  ectorHighlighter(DefaultSolrHighlighter.java:509)
>>         at
>>  
>> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(Defau
>> l
>>  tSolrHighlighter.java:376)
>>         at
>>  
>> org.apache.solr.handler.component.HighlightComponent.process(Highligh
>> t
>>  Component.java:116)
>>         at
>>  
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(Sea
>> r
>>  chHandler.java:194)
>>         at
>>  
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
>> e
>>  rBase.java:129)
>>         at org.apache.solr.core.SolrCore.execute(SolrCore.

RE: Can Solr handle large text files?

2011-10-21 Thread Anand.Nigam
Hi,

I was also facing the issue of highlighting the large text files. I applied the 
solution proposed here and it worked. But I am getting following error :


Basically 'hitGrouped.vm' is not found. I am using solr-3.4.0. Where can I get 
this file from. Its reference is present in browse.vm


  #if($response.response.get('grouped'))
#foreach($grouping in $response.response.get('grouped'))
  #parse("hitGrouped.vm")
#end
  #else
#foreach($doc in $response.results)
  #parse("hit.vm")
#end
  #end



HTTP Status 500 - Can't find resource 'hitGrouped.vm' in classpath or 
'C:\caprice\workspace\caprice\dist\DEV\solr\.\conf/', 
cwd=C:\glassfish3\glassfish\domains\domain1\config java.lang.RuntimeException: 
Can't find resource 'hitGrouped.vm' in classpath or 
'C:\caprice\workspace\caprice\dist\DEV\solr\.\conf/', 
cwd=C:\glassfish3\glassfish\domains\domain1\config at 
org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:268)
 at 
org.apache.solr.response.SolrVelocityResourceLoader.getResourceStream(SolrVelocityResourceLoader.java:42)
 at org.apache.velocity.Template.process(Template.java:98) at 
org.apache.velocity.runtime.resource.ResourceManagerImpl.loadResource(ResourceManagerImpl.java:446)
 at 

Thanks & Regards,
Anand
Anand Nigam
RBS Global Banking & Markets
Office: +91 124 492 5506   


-Original Message-
From: karsten-s...@gmx.de [mailto:karsten-s...@gmx.de] 
Sent: 21 October 2011 14:58
To: solr-user@lucene.apache.org
Subject: Re: Can Solr handle large text files?

Hi Peter,

highlighting in large text files can not be fast without dividing the original 
text in small piece.
So take a look in
http://xtf.cdlib.org/documentation/under-the-hood/#Chunking
and in
http://www.lucidimagination.com/blog/2010/09/16/2446/

Which means that you should divide your files and use Result Grouping / Field 
Collapsing to list only one hit per original document.

(xtf also would solve your problem "out of the box" but xtf does not use solr).

Best regards
  Karsten

 Original-Nachricht 
> Datum: Thu, 20 Oct 2011 17:59:04 -0700
> Von: Peter Spam 
> An: solr-user@lucene.apache.org
> Betreff: Can Solr handle large text files?

> I have about 20k text files, some very small, but some up to 300MB, 
> and would like to do text searching with highlighting.
> 
> Imagine the text is the contents of your syslog.
> 
> I would like to type in some terms, such as "error" and "mail", and 
> have Solr return the syslog lines with those terms PLUS two lines of context.
> Pretty much just like Google's highlighting.
> 
> 1) Can Solr handle this?  I had extremely long query times when I 
> tried this with Solr 1.4.1 (yes I was using TermVectors, etc.).  I 
> tried breaking the files into 1MB pieces, but searching would be wonky 
> => return the wrong number of documents (ie. if one file had a term 5 
> times, and that was the only file that had the term, I want 1 result, not 5 
> results).
> 
> 2) What sort of tokenizer would be best?  Here's what I'm using:
> 
> multiValued="false" termVectors="true" termPositions="true" 
> termOffsets="true" />
> 
> 
>   
> 
> 
>  generateWordParts="0" generateNumberParts="0" catenateWords="0" 
> catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="0"/>
>   
> 
> 
> 
> Thanks!
> Pete

***
 
The Royal Bank of Scotland plc. Registered in Scotland No 90312. 
Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB. 
Authorised and regulated by the Financial Services Authority. The 
Royal Bank of Scotland N.V. is authorised and regulated by the 
De Nederlandsche Bank and has its seat at Amsterdam, the 
Netherlands, and is registered in the Commercial Register under 
number 33002587. Registered Office: Gustav Mahlerlaan 350, 
Amsterdam, The Netherlands. The Royal Bank of Scotland N.V. and 
The Royal Bank of Scotland plc are authorised to act as agent for each 
other in certain jurisdictions. 
  
This e-mail message is confidential and for use by the addressee only. 
If the message is received by anyone other than the addressee, please 
return the message to the sender by replying to it and then delete the 
message from your computer. Internet e-mails are not necessarily 
secure. The Royal Bank of Scotland plc and The Royal Bank of Scotland 
N.V. including its affiliates ("RBS group") does not accept responsibility 
for changes made to this message after it was sent. For the protection
of RBS group and its clients and customers, and in compliance with
regulatory requirements, the contents of both incoming and outgoing
e-mail communications, which could include proprietary information and
Non-Public Personal Information, may be read by authorised persons
within RBS group other than the intended recipient(s). 

Whilst all reasonable care has been taken to avoid the transmission of 
viruses, it is the responsibility of 

RE: Can Solr handle large text files?

2011-10-31 Thread Anand.Nigam
Hi,

Basically I need to index very large log files. I have modified the 
ExtractingDocumentLoader to create a new document for every 50 lines (it is 
made configurable by keeping it as a system property)  of the log file being 
indexed. 'Filename' field for document created from 1 log file is kept the same 
and unique id is generated by appending the line no. with the file name, e.g 
'log.txt (line no. 100 -150)'. Each doc is given the custom score stored in 
field called 'custom_score' which is directly proportional to its distance from 
the beginning of the file.

I have also found 'hitGrouped.vm' from the net. Since I am reading only 50 
lines for each document so the default max chunk size works for me but it can 
be easily adjusted depending upon the no of lines you are reading per doc.

Now I have done the grouping based on the 'filename' field and show the results 
from docs having highest score as a result I am able to show the last matching 
results from log file. Query parameters that I am using for search are:

http://localhost:8080/solr/select?defType=dismax&qf=Content&q=Solr&fl=id,score&defType=dismax&bf=sub(1000,caprice_score)&group=true&group.field=FileName

Results are amazing, I am able to index and search from very larger log files 
(few 100 MBs) with very low memory requirements. Highlighting is also working 
fine.

Thanks & Regards,
Anand





Anand Nigam
RBS Global Banking & Markets
Office: +91 124 492 5506   

-Original Message-
From: Peter Spam [mailto:ps...@mac.com] 
Sent: 21 October 2011 23:04
To: solr-user@lucene.apache.org
Subject: Re: Can Solr handle large text files?

Thanks for your note, Anand.  What was the maximum chunk size for you?  Could 
you post the relevant portions of your configuration file?


Thanks!
Pete

On Oct 21, 2011, at 4:20 AM, anand.ni...@rbs.com wrote:

> Hi,
> 
> I was also facing the issue of highlighting the large text files. I applied 
> the solution proposed here and it worked. But I am getting following error :
> 
> 
> Basically 'hitGrouped.vm' is not found. I am using solr-3.4.0. Where 
> can I get this file from. Its reference is present in browse.vm
> 
> 
>  #if($response.response.get('grouped'))
>#foreach($grouping in $response.response.get('grouped'))
>  #parse("hitGrouped.vm")
>#end
>  #else
>#foreach($doc in $response.results)
>  #parse("hit.vm")
>#end
>  #end
> 
> 
> 
> HTTP Status 500 - Can't find resource 'hitGrouped.vm' in classpath or 
> 'C:\caprice\workspace\caprice\dist\DEV\solr\.\conf/', 
> cwd=C:\glassfish3\glassfish\domains\domain1\config 
> java.lang.RuntimeException: Can't find resource 'hitGrouped.vm' in 
> classpath or 'C:\caprice\workspace\caprice\dist\DEV\solr\.\conf/', 
> cwd=C:\glassfish3\glassfish\domains\domain1\config at 
> org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoade
> r.java:268) at 
> org.apache.solr.response.SolrVelocityResourceLoader.getResourceStream(
> SolrVelocityResourceLoader.java:42) at 
> org.apache.velocity.Template.process(Template.java:98) at 
> org.apache.velocity.runtime.resource.ResourceManagerImpl.loadResource(
> ResourceManagerImpl.java:446) at
> 
> Thanks & Regards,
> Anand
> Anand Nigam
> RBS Global Banking & Markets
> Office: +91 124 492 5506   
> 
> 
> -Original Message-
> From: karsten-s...@gmx.de [mailto:karsten-s...@gmx.de]
> Sent: 21 October 2011 14:58
> To: solr-user@lucene.apache.org
> Subject: Re: Can Solr handle large text files?
> 
> Hi Peter,
> 
> highlighting in large text files can not be fast without dividing the 
> original text in small piece.
> So take a look in
> http://xtf.cdlib.org/documentation/under-the-hood/#Chunking
> and in
> http://www.lucidimagination.com/blog/2010/09/16/2446/
> 
> Which means that you should divide your files and use Result Grouping / Field 
> Collapsing to list only one hit per original document.
> 
> (xtf also would solve your problem "out of the box" but xtf does not use 
> solr).
> 
> Best regards
>  Karsten
> 
>  Original-Nachricht 
>> Datum: Thu, 20 Oct 2011 17:59:04 -0700
>> Von: Peter Spam 
>> An: solr-user@lucene.apache.org
>> Betreff: Can Solr handle large text files?
> 
>> I have about 20k text files, some very small, but some up to 300MB, 
>> and would like to do text searching with highlighting.
>> 
>> Imagine the text is the contents of your syslog.
>> 
>> I would like to type in some terms, such as "error" and "mail", and 
>> have Solr return the syslog lines with those terms PLUS two lines of context.
>> Pretty much just like Google's highlighting.
>> 
>> 1) Can Solr handle this?  I had extremely long query times when I 
>> tried this with Solr 1.4.1 (yes I was using TermVectors, etc.).  I 
>> tried breaking the files into 1MB pieces, but searching would be 
>> wonky => return the wrong number of documents (ie. if one file had a 
>> term 5 times, and that was the only file that had the term, I want 1 result, 
>> not 5 results).
>> 
>> 2) What sort of tokenize

Problem starting solr on jetty

2011-07-27 Thread Anand.Nigam
Hi,

I am new to solr. I have downloaded the solr 3.3.0 distribution and tryign to 
run it using java -jar start.jar from the apache-solr-3.3.0\example directory 
(start.jar is present here). But I am getting following error on running this 
command:

C:\downloads\apache-solr-3.3.0\apache-solr-3.3.0\example>java -jar start.jar
java.lang.NullPointerException
at java.io.File.(File.java:222)
at org.mortbay.start.Main.init(Main.java:465)
at org.mortbay.start.Main.start(Main.java:439)
at org.mortbay.start.Main.main(Main.java:119)

Could someone help me in resolving this issue.

Thanks & Regards
Anand Nigam


***
 
The Royal Bank of Scotland plc. Registered in Scotland No 90312. 
Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB. 
Authorised and regulated by the Financial Services Authority. The 
Royal Bank of Scotland N.V. is authorised and regulated by the 
De Nederlandsche Bank and has its seat at Amsterdam, the 
Netherlands, and is registered in the Commercial Register under 
number 33002587. Registered Office: Gustav Mahlerlaan 350, 
Amsterdam, The Netherlands. The Royal Bank of Scotland N.V. and 
The Royal Bank of Scotland plc are authorised to act as agent for each 
other in certain jurisdictions. 
  
This e-mail message is confidential and for use by the addressee only. 
If the message is received by anyone other than the addressee, please 
return the message to the sender by replying to it and then delete the 
message from your computer. Internet e-mails are not necessarily 
secure. The Royal Bank of Scotland plc and The Royal Bank of Scotland 
N.V. including its affiliates ("RBS group") does not accept responsibility 
for changes made to this message after it was sent. For the protection
of RBS group and its clients and customers, and in compliance with
regulatory requirements, the contents of both incoming and outgoing
e-mail communications, which could include proprietary information and
Non-Public Personal Information, may be read by authorised persons
within RBS group other than the intended recipient(s). 

Whilst all reasonable care has been taken to avoid the transmission of 
viruses, it is the responsibility of the recipient to ensure that the onward 
transmission, opening or use of this message and any attachments will 
not adversely affect its systems or data. No responsibility is accepted 
by the RBS group in this regard and the recipient should carry out such 
virus and other checks as it considers appropriate. 

Visit our website at www.rbs.com 

***
  


RE: Problem starting solr on jetty

2011-07-27 Thread Anand.Nigam
 
Thanks for your reply Steve.

My environment details:

Java version: 1.6.0_24
System: Microsoft Windows  XP Professional Version 2002 Service Pack 3

Interestingly my colleagues who have the same environment are not facing this 
problem.

Thanks & Regards
Anand Nigam

-Original Message-
From: Steven A Rowe [mailto:sar...@syr.edu] 
Sent: 27 July 2011 20:21
To: solr-user@lucene.apache.org
Subject: RE: Problem starting solr on jetty

Hi Anand,

Someone else reported this exact same error with Solr v1.4.0: 
http://www.lucidimagination.com/search/document/fd5b83f3595a1c6c/can_t_start_solr_by_java_jar_start_jar

I downloaded the apache-solr-3.3.0.zip, unpacked it, then ran 'java -jar 
start.jar' from the cmdline.  It worked.  (Windows 7; Oracle Java 1.6.0_23).

I tried to reproduce the error you're seeing, by making the example\ directory 
and all its contents read-only (different exception: FileNotFound), and by 
removing the entire contents of the example\ directory except for start.jar 
(nothing happens - it just quits without printing anything out).

Can you give more details about your environment?

Steve

-Original Message-
From: anand.ni...@rbs.com [mailto:anand.ni...@rbs.com]
Sent: Wednesday, July 27, 2011 7:25 AM
To: solr-user@lucene.apache.org
Subject: Problem starting solr on jetty

Hi,

I am new to solr. I have downloaded the solr 3.3.0 distribution and tryign to 
run it using java -jar start.jar from the apache-solr-3.3.0\example directory 
(start.jar is present here). But I am getting following error on running this 
command:

C:\downloads\apache-solr-3.3.0\apache-solr-3.3.0\example>java -jar start.jar 
java.lang.NullPointerException
at java.io.File.(File.java:222)
at org.mortbay.start.Main.init(Main.java:465)
at org.mortbay.start.Main.start(Main.java:439)
at org.mortbay.start.Main.main(Main.java:119)

Could someone help me in resolving this issue.

Thanks & Regards
Anand Nigam


***
The Royal Bank of Scotland plc. Registered in Scotland No 90312. 
Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB. 
Authorised and regulated by the Financial Services Authority. The Royal Bank of 
Scotland N.V. is authorised and regulated by the De Nederlandsche Bank and has 
its seat at Amsterdam, the Netherlands, and is registered in the Commercial 
Register under number 33002587. Registered Office: Gustav Mahlerlaan 350, 
Amsterdam, The Netherlands. The Royal Bank of Scotland N.V. and The Royal Bank 
of Scotland plc are authorised to act as agent for each other in certain 
jurisdictions. 
  
This e-mail message is confidential and for use by the addressee only. 
If the message is received by anyone other than the addressee, please return 
the message to the sender by replying to it and then delete the message from 
your computer. Internet e-mails are not necessarily secure. The Royal Bank of 
Scotland plc and The Royal Bank of Scotland N.V. including its affiliates ("RBS 
group") does not accept responsibility for changes made to this message after 
it was sent. For the protection of RBS group and its clients and customers, and 
in compliance with regulatory requirements, the contents of both incoming and 
outgoing e-mail communications, which could include proprietary information and 
Non-Public Personal Information, may be read by authorised persons within RBS 
group other than the intended recipient(s). 

Whilst all reasonable care has been taken to avoid the transmission of viruses, 
it is the responsibility of the recipient to ensure that the onward 
transmission, opening or use of this message and any attachments will not 
adversely affect its systems or data. No responsibility is accepted by the RBS 
group in this regard and the recipient should carry out such virus and other 
checks as it considers appropriate. 

Visit our website at www.rbs.com 

***
  


RE: Problem starting solr on jetty

2011-07-27 Thread Anand.Nigam
 
Thanks for your reply Steve.

My environment details:

Java version: 1.6.0_24
System: Microsoft Windows  XP Professional Version 2002 Service Pack 3

Interestingly my colleagues who have the same environment are not facing this 
problem.

Thanks & Regards
Anand Nigam

-Original Message-
From: Nigam, Anand, GBM 
Sent: 28 July 2011 08:37
To: solr-user@lucene.apache.org
Subject: RE: Problem starting solr on jetty

 
Thanks for your reply Steve.

My environment details:

Java version: 1.6.0_24
System: Microsoft Windows  XP Professional Version 2002 Service Pack 3

Interestingly my colleagues who have the same environment are not facing this 
problem.

Thanks & Regards
Anand Nigam

-Original Message-
From: Steven A Rowe [mailto:sar...@syr.edu]
Sent: 27 July 2011 20:21
To: solr-user@lucene.apache.org
Subject: RE: Problem starting solr on jetty

Hi Anand,

Someone else reported this exact same error with Solr v1.4.0: 
http://www.lucidimagination.com/search/document/fd5b83f3595a1c6c/can_t_start_solr_by_java_jar_start_jar

I downloaded the apache-solr-3.3.0.zip, unpacked it, then ran 'java -jar 
start.jar' from the cmdline.  It worked.  (Windows 7; Oracle Java 1.6.0_23).

I tried to reproduce the error you're seeing, by making the example\ directory 
and all its contents read-only (different exception: FileNotFound), and by 
removing the entire contents of the example\ directory except for start.jar 
(nothing happens - it just quits without printing anything out).

Can you give more details about your environment?

Steve

-Original Message-
From: anand.ni...@rbs.com [mailto:anand.ni...@rbs.com]
Sent: Wednesday, July 27, 2011 7:25 AM
To: solr-user@lucene.apache.org
Subject: Problem starting solr on jetty

Hi,

I am new to solr. I have downloaded the solr 3.3.0 distribution and tryign to 
run it using java -jar start.jar from the apache-solr-3.3.0\example directory 
(start.jar is present here). But I am getting following error on running this 
command:

C:\downloads\apache-solr-3.3.0\apache-solr-3.3.0\example>java -jar start.jar 
java.lang.NullPointerException
at java.io.File.(File.java:222)
at org.mortbay.start.Main.init(Main.java:465)
at org.mortbay.start.Main.start(Main.java:439)
at org.mortbay.start.Main.main(Main.java:119)

Could someone help me in resolving this issue.

Thanks & Regards
Anand Nigam


***
The Royal Bank of Scotland plc. Registered in Scotland No 90312. 
Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB. 
Authorised and regulated by the Financial Services Authority. The Royal Bank of 
Scotland N.V. is authorised and regulated by the De Nederlandsche Bank and has 
its seat at Amsterdam, the Netherlands, and is registered in the Commercial 
Register under number 33002587. Registered Office: Gustav Mahlerlaan 350, 
Amsterdam, The Netherlands. The Royal Bank of Scotland N.V. and The Royal Bank 
of Scotland plc are authorised to act as agent for each other in certain 
jurisdictions. 
  
This e-mail message is confidential and for use by the addressee only. 
If the message is received by anyone other than the addressee, please return 
the message to the sender by replying to it and then delete the message from 
your computer. Internet e-mails are not necessarily secure. The Royal Bank of 
Scotland plc and The Royal Bank of Scotland N.V. including its affiliates ("RBS 
group") does not accept responsibility for changes made to this message after 
it was sent. For the protection of RBS group and its clients and customers, and 
in compliance with regulatory requirements, the contents of both incoming and 
outgoing e-mail communications, which could include proprietary information and 
Non-Public Personal Information, may be read by authorised persons within RBS 
group other than the intended recipient(s). 

Whilst all reasonable care has been taken to avoid the transmission of viruses, 
it is the responsibility of the recipient to ensure that the onward 
transmission, opening or use of this message and any attachments will not 
adversely affect its systems or data. No responsibility is accepted by the RBS 
group in this regard and the recipient should carry out such virus and other 
checks as it considers appropriate. 

Visit our website at www.rbs.com 

***
  


RE: Problem starting solr on jetty

2011-07-27 Thread Anand.Nigam
Hi All,

I tried to debug the issue by runing start.jar in eclipse debuger and found 
that the root of the issue was that the jetty.home system property was not set. 
If I set the jetty.home property then the server starts properly.

Thanks,
Anand

-Original Message-
From: Nigam, Anand, GBM 
Sent: 28 July 2011 08:39
To: solr-user@lucene.apache.org
Subject: RE: Problem starting solr on jetty

 
Thanks for your reply Steve.

My environment details:

Java version: 1.6.0_24
System: Microsoft Windows  XP Professional Version 2002 Service Pack 3

Interestingly my colleagues who have the same environment are not facing this 
problem.

Thanks & Regards
Anand Nigam

-Original Message-
From: Nigam, Anand, GBM
Sent: 28 July 2011 08:37
To: solr-user@lucene.apache.org
Subject: RE: Problem starting solr on jetty

 
Thanks for your reply Steve.

My environment details:

Java version: 1.6.0_24
System: Microsoft Windows  XP Professional Version 2002 Service Pack 3

Interestingly my colleagues who have the same environment are not facing this 
problem.

Thanks & Regards
Anand Nigam

-Original Message-
From: Steven A Rowe [mailto:sar...@syr.edu]
Sent: 27 July 2011 20:21
To: solr-user@lucene.apache.org
Subject: RE: Problem starting solr on jetty

Hi Anand,

Someone else reported this exact same error with Solr v1.4.0: 
http://www.lucidimagination.com/search/document/fd5b83f3595a1c6c/can_t_start_solr_by_java_jar_start_jar

I downloaded the apache-solr-3.3.0.zip, unpacked it, then ran 'java -jar 
start.jar' from the cmdline.  It worked.  (Windows 7; Oracle Java 1.6.0_23).

I tried to reproduce the error you're seeing, by making the example\ directory 
and all its contents read-only (different exception: FileNotFound), and by 
removing the entire contents of the example\ directory except for start.jar 
(nothing happens - it just quits without printing anything out).

Can you give more details about your environment?

Steve

-Original Message-
From: anand.ni...@rbs.com [mailto:anand.ni...@rbs.com]
Sent: Wednesday, July 27, 2011 7:25 AM
To: solr-user@lucene.apache.org
Subject: Problem starting solr on jetty

Hi,

I am new to solr. I have downloaded the solr 3.3.0 distribution and tryign to 
run it using java -jar start.jar from the apache-solr-3.3.0\example directory 
(start.jar is present here). But I am getting following error on running this 
command:

C:\downloads\apache-solr-3.3.0\apache-solr-3.3.0\example>java -jar start.jar 
java.lang.NullPointerException
at java.io.File.(File.java:222)
at org.mortbay.start.Main.init(Main.java:465)
at org.mortbay.start.Main.start(Main.java:439)
at org.mortbay.start.Main.main(Main.java:119)

Could someone help me in resolving this issue.

Thanks & Regards
Anand Nigam


***
The Royal Bank of Scotland plc. Registered in Scotland No 90312. 
Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB. 
Authorised and regulated by the Financial Services Authority. The Royal Bank of 
Scotland N.V. is authorised and regulated by the De Nederlandsche Bank and has 
its seat at Amsterdam, the Netherlands, and is registered in the Commercial 
Register under number 33002587. Registered Office: Gustav Mahlerlaan 350, 
Amsterdam, The Netherlands. The Royal Bank of Scotland N.V. and The Royal Bank 
of Scotland plc are authorised to act as agent for each other in certain 
jurisdictions. 
  
This e-mail message is confidential and for use by the addressee only. 
If the message is received by anyone other than the addressee, please return 
the message to the sender by replying to it and then delete the message from 
your computer. Internet e-mails are not necessarily secure. The Royal Bank of 
Scotland plc and The Royal Bank of Scotland N.V. including its affiliates ("RBS 
group") does not accept responsibility for changes made to this message after 
it was sent. For the protection of RBS group and its clients and customers, and 
in compliance with regulatory requirements, the contents of both incoming and 
outgoing e-mail communications, which could include proprietary information and 
Non-Public Personal Information, may be read by authorised persons within RBS 
group other than the intended recipient(s). 

Whilst all reasonable care has been taken to avoid the transmission of viruses, 
it is the responsibility of the recipient to ensure that the onward 
transmission, opening or use of this message and any attachments will not 
adversely affect its systems or data. No responsibility is accepted by the RBS 
group in this regard and the recipient should carry out such virus and other 
checks as it considers appropriate. 

Visit our website at www.rbs.com 

***
  


Highlighting does not works with uniqueField set

2011-08-03 Thread Anand.Nigam
Hi,

I am new to solr. Am facing an issue wherein the highlighting of the 
searchresults for matches is not working when I have set a unique field as:

id

If this is commented then highlighting starts working. I need to have a unique 
field. Could someone please explain this erratic behaviour. I am setting this 
field while posting the documents to be indexed.

Thanks & Regards,
Anand


***
 
The Royal Bank of Scotland plc. Registered in Scotland No 90312. 
Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB. 
Authorised and regulated by the Financial Services Authority. The 
Royal Bank of Scotland N.V. is authorised and regulated by the 
De Nederlandsche Bank and has its seat at Amsterdam, the 
Netherlands, and is registered in the Commercial Register under 
number 33002587. Registered Office: Gustav Mahlerlaan 350, 
Amsterdam, The Netherlands. The Royal Bank of Scotland N.V. and 
The Royal Bank of Scotland plc are authorised to act as agent for each 
other in certain jurisdictions. 
  
This e-mail message is confidential and for use by the addressee only. 
If the message is received by anyone other than the addressee, please 
return the message to the sender by replying to it and then delete the 
message from your computer. Internet e-mails are not necessarily 
secure. The Royal Bank of Scotland plc and The Royal Bank of Scotland 
N.V. including its affiliates ("RBS group") does not accept responsibility 
for changes made to this message after it was sent. For the protection
of RBS group and its clients and customers, and in compliance with
regulatory requirements, the contents of both incoming and outgoing
e-mail communications, which could include proprietary information and
Non-Public Personal Information, may be read by authorised persons
within RBS group other than the intended recipient(s). 

Whilst all reasonable care has been taken to avoid the transmission of 
viruses, it is the responsibility of the recipient to ensure that the onward 
transmission, opening or use of this message and any attachments will 
not adversely affect its systems or data. No responsibility is accepted 
by the RBS group in this regard and the recipient should carry out such 
virus and other checks as it considers appropriate. 

Visit our website at www.rbs.com 

***
  


java.lang.IllegalStateException: Committed error in the logs

2011-08-03 Thread Anand.Nigam
I am getting following error log on trying to search. Any idea why this error 
is coming. Search results are coming after a long delay.



SEVERE: org.mortbay.jetty.EofException
at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791)
at 
org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569)
at 
org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:278)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212)
at org.apache.solr.common.util.FastWriter.flush(FastWriter.java:115)
at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:344)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.net.SocketException: Connection reset
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at org.mortbay.io.ByteArrayBuffer.writeTo(ByteArrayBuffer.java:368)
at org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:129)
at org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:149)
at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:714)
... 25 more

2011-08-04 06:05:10.550:WARN::Committed before 500 
null||org.mortbay.jetty.EofException|?at 
org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791)|?at 
org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569)|?at
 org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012)|?at 
sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:278)|?at 
sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122)|?at 
java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212)|?at 
org.apache.solr.common.util.FastWriter.flush(FastWriter.java:115)|?at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:344)|?at
 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)|?at
 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)|?at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)|?at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)|?at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)|?at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)|?at 
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)|?at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)|?at
 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)|?at
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)|?at 
org.mortbay.jetty.Server.handle(Server.java:326)|?at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)|?at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)|?at
 org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)|?at 
org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)|?at 
org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)|?at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)|?at
 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)|Caused
 by: java.ne