Re: How to deal with many files using solr external file field

2011-06-07 Thread Simon Rosenthal
Can you provide a stack trace for the OOM eexception ?

On Tue, Jun 7, 2011 at 4:25 PM, Bohnsack, Sven
wrote:

> Hi all,
>
> we're using solr 1.4 and external file field ([1]) for sorting our
> searchresults. We have about 40.000 Terms, for which we use this sorting
> option.
> Currently we're running into massive OutOfMemory-Problems and were not
> pretty sure, what's the matter. It seems that the garbage collector stops
> working or some processes are going wild. However, solr starts to allocate
> more and more RAM until we experience this OutOfMemory-Exception.
>
>
> We noticed the following:
>
> For some terms one could see in the solr log that there appear some
> java.io.FileNotFoundExceptions, when solr tries to load an external file for
> a term for which there is not such a file, e.g. solr tries to load the
> external score file for "trousers" but there ist none in the
> /solr/data-Folder.
>
> Question: is it possible, that those exceptions are responsible for the
> OutOfMemory-Problem or could it be due to the large(?) number of 40k terms
> for which we want to sort the result via external file field?
>
> I'm looking forward for your answers, suggestions and ideas :)
>
>
> Regards
> Sven
>
>
> [1]:
> http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html
>


Re: Find newly added documents

2010-01-23 Thread Simon Rosenthal
"newly added" is a bit vague.  Do you mean "since last Sunday" ? "between
the last  and the one before that" ? Also, do you need to
distinguish between updated and newly added documents ?

Perhaps you could be more specific about the use case.

-Simon

On Fri, Jan 22, 2010 at 4:25 AM, Erik Hatcher wrote:

> You can do a search, sort by the special _docid_ "field" (underscores
> mandatory) descending and the top documents listed will be the latest added.
>
> Like this, un-url-encoded:   q=*:*&sort=_docid_ desc
>
>Erik
>
>
>
> On Jan 22, 2010, at 3:39 AM, Sandeep Tagore wrote:
>
>
>> Thanks a lot Erik. Is there any other alternate way?
>> Thanks a lot for your response.
>>
>> Regards,
>> Sandeep
>>
>>
>> You'll be able to find them only after a commit.
>>
>> One way to do this is index a timestamp with every document, and find
>> the latest ones using that field.  There's an example of an automatic
>> timestamp field in the example schema.
>>
>>Erik
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Find-newly-added-documents-tp27254813p27270104.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>


Re: unloading a solr core doesn't free any memory

2010-02-08 Thread Simon Rosenthal
What Garbage Collection parameters is the JVM using ?   the memory will not
always be freed immediately after an event like unloading a core or starting
a new searcher.

2010/2/8 Tim Terlegård 

> To me it doesn't look like unloading a Solr Core frees the memory that
> the core has used. Is this how it should be?
>
> I have a big index with 50 million documents. After loading a core it
> takes 300 MB RAM. After a query with a couple of sort fields Solr
> takes about 8 GB RAM. Then I unload (CoreAdminRequest.unloadCore) the
> core. The core is not shown in /solr/ anymore. Solr still takes 8 GB
> RAM. Creating new cores is super slow because I have hardly any memory
> left. Do I need to free the memory explicitly somehow?
>
> /Tim
>


Re: HttpDataSource consume REST API with Authentication required

2010-03-04 Thread Simon Rosenthal
http://issues.apache.org/jira/browse/SOLR-1490  has a patch which will do
what you want

-Simon

On Thu, Mar 4, 2010 at 2:21 PM, javaxmlsoapdev  wrote:

>
> I have to use
>
> http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource
> HttpDataSource  to ask Solr consume my REST service and index data returned
> from that service. My application/service has authentication/authorization.
> When Solr invokes this service it MUST have valid credentials and stuff.
> How/where do I configure/write authentication part before Solr consumes my
> REST service?
>
> Any pointers would be appreciated.
>
> Thanks,
>
> --
> View this message in context:
> http://old.nabble.com/HttpDataSource-consume-REST-API-with-Authentication-required-tp27785340p27785340.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Indexing a word in url

2008-04-02 Thread Simon Rosenthal
I also couldn't  get the exact results I wanted for indexing URL components
using WordDelimeterFilter or patternTokenizer, so resorted to adding a new
field ('pathparts'), plus a few lines of code to  generate the tokens in our
content preprocessor which submits documents to SOLR for indexing.

-Simon

On Tue, Apr 1, 2008 at 7:24 PM, Chris Hostetter <[EMAIL PROTECTED]>
wrote:

>
> : Actually I want to use anything that is not alphabet or digit to be the
> : separator - anything between them will be a word (so that I can use the
> URL
> : fragment to see what is indexed about this site)...any suggestion?
>
> In addition to Mike's suggestion of trying out the WordDelimiterFilter,
> take a look at the PatternTokenizerFactory.
>
>
>
> -Hoss
>
>


Re: SolrClient from inside processAdd function

2019-09-04 Thread Simon Rosenthal
Similarly, I had considered a URP which would call the Solr Tagger to add
new metadata fields  for indexing to incoming documents (and recall
discussing this with David Smiley), but eventually decided against this
approach on the grounds of complexity.

-Simon

On Wed, Sep 4, 2019 at 2:10 PM Arnold Bronley 
wrote:

> I need to search some other collection inside processAdd function and
> append that information to the indexing request.
>
> On Tue, Sep 3, 2019 at 7:55 PM Erick Erickson 
> wrote:
>
> > This really sounds like an XY problem. What do you need the SolrClient
> > _for_? I suspect there’s an easier way to do this…..
> >
> > Best,
> > Erick
> >
> > > On Sep 3, 2019, at 6:17 PM, Arnold Bronley 
> > wrote:
> > >
> > > Hi,
> > >
> > > Is there a way to create SolrClient from inside processAdd function for
> > > custom update processor for the same Solr on which it is executing?
> >
> >
>


-- 
I am transferring my email  from Yahoo to simon.rosent...@gmail.com. I will
continue to receive Yahoo email but will reply from this account. Please
update your address lists accordingly.


Re: Solr Text Tagger | All tags in desc order

2019-10-04 Thread Simon Rosenthal
Hi Vipul:

I'm not sure what you mean by 'score' in this context, as tagging requests
do not return a standard Solr/Lucene score. If you're looking for the
number of times a specific tag occurs in the tagged text, then you'll need
to calculate that in your application from the returned JSON.

HTH

-Simon

On Fri, Oct 4, 2019 at 5:41 AM Vipul Sharma  wrote:

> Hi All,
>
> After putting all the master data in Solr Text Tagger, I want to parse
> resume text to fetch the top five skills based on there score is there any
> way to fetch the result in descending order?
>