Re: Question on Lots Of cores - How do I know it's working

2013-11-08 Thread vybe3142
On a related note, .. In our application, the cores can get moderately large , and since we mostly use a subset of them on a roughly LRU basis, the dynamic core loading seems a good fit. We interact with our solr server via a solrj client. That said, we do require the capability to access older c

Re: Question on Lots Of cores - How do I know it's working

2013-11-08 Thread vybe3142
Thanks so much for the answer, and for "JIRA-fying" it. -- View this message in context: http://lucene.472066.n3.nabble.com/Question-on-Lots-Of-cores-How-do-I-know-it-s-working-tp4099847p415.html Sent from the Solr - User mailing list archive at Nabble.com.

Question on Lots Of cores - How do I know it's workin

2013-11-07 Thread vybe3142
As I understand it, the "lots of cores" features enables dynamic loading and unloading of cores This is how I set up my solr.xml for a test where I created more cores than the transientCacheSize. Here is a link to the config in case it doesn't format well via this post. https://gist.github.com/ano

Re: Language Identification and Stemming

2013-03-01 Thread vybe3142
>From your response, I gather that there's no way to maintain a single set of fields for multiple languages i.e. I can't use a field "text" for the body text. Instead, I would have to define text_en, text_fr, text_ru etc each mapped to their specific languages. -- View this message in context:

What am I doing wrong - writing an OpenNLP Filter

2013-02-28 Thread vybe3142
Since the official OpenNLP filter is not yet in an actual release, I'm experimenting with the OpenNLP filter implementation described in chapter 8 of the Taming Text Book http://www.manning.com/ingersoll/Sample-ch08.pdf . The original code is at : https://github.com/tamingtext/book/tree/master/src

Re: Multi Core / On demand loading

2013-02-14 Thread vybe3142
Thanks, We run SOLR 4.0 in production. Yesterday, I ported our configuration to 4.1 on my local workstation. I just looked at the SOLR-4400 fix versions and as per the info, I might wait till 4.2 before porting. -- View this message in context: http://lucene.472066.n3.nabble.com/Multi-Core-On-

Mahout - Solr vs Mahout Lucene Question

2013-01-24 Thread vybe3142
Hi, I hate to double post but I'm not sure in which domain, the answer to my question lies, so here's the link to my question on the mahout groups. Basically, I'm getting different clustering results depending on whether I index data with SOLR or Lucene. Please post any responses against the origi

SOLR 4 / Tomcat Startup Error: java.lang.NoClassDefFoundError: org/apache/lucene/codecs/sep/IntStreamFactory

2012-10-29 Thread vybe3142
I could well be doing something wrong here, but so far I haven't figured it out. I currently run SOLR 4 BETA / multicore and I was investigating migrating to SOLR 4.0 (on my workstation). I've even backed out my custom schema and solrconfig so I'm running as close to original as possible with no

Re: Can SOLR Index UTF-16 Text

2012-10-03 Thread vybe3142
Thanks for all the responses. Problem partially solved (see below) 1. In a sense, my question is theoretical since the input to out SOLR server is (currently) UTF-8 files produced by a third party text extraction utility (not Tika). On the server side, we read and index the text via a custom data

Can SOLR Index UTF-16 Text

2012-09-27 Thread vybe3142
Our SOLR setup (4.0.BETA on Tomcat 6) works as expected when indexing UTF-8 files. Recently, however, we noticed that it has issues with indexing certain text files eg. UTF-16 files. See attachment for an example (tarred+zipped) tesla-utf16.txt

Re: 4.0.snapshot to 4.0.beta index migration

2012-09-27 Thread vybe3142
Thanks, that's what we decided to do too. -- View this message in context: http://lucene.472066.n3.nabble.com/4-0-snapshot-to-4-0-beta-index-migration-tp4009247p4010828.html Sent from the Solr - User mailing list archive at Nabble.com.

4.0.snapshot to 4.0.beta index migration

2012-09-20 Thread vybe3142
Hi We have a bunch of data that was indexes using a 4.0 snapshot build of solr We'd like to migrate to the 4.0.beta version. Is there a reccomended way to migrate the indices or is reindexing the best option Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/4-0-snapsh

LotsOfCores : Any alternative approaches till it's ready

2012-09-19 Thread vybe3142
LotsOfCores ( http://wiki.apache.org/solr/LotsOfCores ) is intended to dynamically juggle loading (and unloading ) required cores where the total number of cores is very large. We're approaching that situation, but it looks like LotsOfCores isn't quite ready for prime time yet. Are there any othe

StatsComponent or ...?

2012-05-10 Thread vybe3142
Hi, My requirement is to calculate the sum of a certain field in the result StatsComponent does what I need eg. results in Question. 1. I don't need to calculate min, max, sumOfSquares etc. Is there a way to limit the stats to the sum and nothing else? 2. Is there going to be a significant

Re: SOLRJ: Is there a way to obtain a quick count of total results for a query

2012-05-04 Thread vybe3142
Fair enough, Thanks. Just wanted to confirm that there wasn't a better way of accomplishing this. -- View this message in context: http://lucene.472066.n3.nabble.com/SOLRJ-Is-there-a-way-to-obtain-a-quick-count-of-total-results-for-a-query-tp3955322p3963295.html Sent from the Solr - User mailing

Re: need some help with a multicore config of solr3.6.0+tomcat7. mine reports: "Severe errors in solr configuration."

2012-05-02 Thread vybe3142
I chronicled exactly what I had to configure to slay this dragon at http://vinaybalamuru.wordpress.com/2012/04/12/solr4-tomcat-multicor/ Hope that helps -- View this message in context: http://lucene.472066.n3.nabble.com/need-some-help-with-a-multicore-config-of-solr3-6-0-tomcat7-mine-reports-Se

SOLRJ: Is there a way to obtain a quick count of total results for a query

2012-05-02 Thread vybe3142
I can achieve this by building a query with start and rows = 0, and using .getResults().getNumFound(). Are there any more efficient approaches to this? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/SOLRJ-Is-there-a-way-to-obtain-a-quick-count-of-total-results-for-a-

Re: Dumb question: Streaming collector /query results

2012-05-02 Thread vybe3142
In other words, .. as an alternative , what's the most efficient way to gain access to all of the document ids that match a query -- View this message in context: http://lucene.472066.n3.nabble.com/Dumb-question-Streaming-collector-query-results-tp3955175p3955194.html Sent from the Solr - User ma

Dumb question: Streaming collector /query results

2012-05-02 Thread vybe3142
I doubt if SOLR has this capability , given that it is based on a RESTful architecture, but I wanted to ask in case I'm mistaken. In lucene, it is easier to gain a direct handle to the collector / scorer and access all the results as they're collected (as opposed to the SOLR query call that perfor

Re: Date granularity

2012-04-20 Thread vybe3142
... Inelegant as opposed to the possibility of using /DAY to specify day granularity on a single term query In any case, if that's how SOLR works, that's fine Any rough idea of the performance of range queries vs truncated day queries? Otherwise, I might just write up a quick program to compare t

Re: Date granularity

2012-04-19 Thread vybe3142
Also, what's the performance impact of range queries vs. querying for a particular DAY (as described in my last post) when the index contains , say, 10 million docs ? If the range queries result in a significant performance hit, one option for us would be to define additional DAY fields when inde

Re: Date granularity

2012-04-19 Thread vybe3142
Thanks So , I tried out the suggestions. I used the main query though (not a filter) 1. Using a DATE range and DAYdoes give me the desired results. Specifically, the query that I used was 2. Without a DATE range, the parser seems to reduce the date to the beginning of the day i.e. 00:00:00 and a

Date granularity

2012-04-18 Thread vybe3142
A query search on a particular date: returns 1valid result (as expected). How can I alter the granularity of the search for example , to all matches on the particular DAY? Reading through various docs, I attempt to append "/DAY" but this doesn't seem to work (in fact I get 0 results back when qu

Re: SOLR 4 / Date Query: Spurious Results: Is it me or ... ?

2012-04-18 Thread vybe3142
Thanks for clarifying. I figured out the (terms=-1). It was my fault. I attempted a truncate of the index in my test case setup by issuing a delete query and think the subsequent commit might not have taken effect by the time the subsequent index queries started. -- View this message in context:

SOLR 4 / Date Query: Spurious Results: Is it me or ... ?

2012-04-17 Thread vybe3142
I wrote a custom handler that uses externally injected metadata (bypassing Tika et all) WRT Dates, I see them associated with the correct docs when retrieving all docs: BUT: looking at the schema analyzer, things look wierd: 1. Top terms = -1 2. The Dates are all mixed up with some spurious 197

Re: SOLR 4 autocommit - is it working as I think it should?

2012-04-11 Thread vybe3142
Thanks, makes perfect sense -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-4-autocommit-is-it-working-as-I-think-it-should-tp3903135p3903353.html Sent from the Solr - User mailing list archive at Nabble.com.

SOLR 4 autocommit - is it working as I think it should?

2012-04-11 Thread vybe3142
I've gotten past most of my initial hurdles with SOLR, with some useful suggestions from this group. Thank You. On to tweaking. This morning, I've been looking at the autocommit functionality as defined in solrconfig.xml. By default, it appears that it should kick in 15 seconds after a new doc

Re: Incremantally updating a VERY LARGE field - Is this possibe ?

2012-04-04 Thread vybe3142
Yonik Seeley-2-2 wrote > > On Wed, Apr 4, 2012 at 3:14 PM, vybe3142 <vybe3142@> wrote: >> >>> Updating a single field is not possible in solr.  The whole record has >>> to >>> be rewritten. >> >> Unfortunate. Lucene allows it. > >

Re: Incremantally updating a VERY LARGE field - Is this possibe ?

2012-04-04 Thread vybe3142
> Updating a single field is not possible in solr. The whole record has to > be rewritten. Unfortunate. Lucene allows it. -- View this message in context: http://lucene.472066.n3.nabble.com/Incrementally-updating-a-VERY-LARGE-field-Is-this-possibe-tp3881945p3885253.html Sent from the Solr -

Re: Incremantally updating a VERY LARGE field - Is this possibe ?

2012-04-04 Thread vybe3142
Thanks. Increasing max. heap space is not a scalable option as it reduces the ability of the system to scale with multiple concurrent index requests. The use case is indexing a set of text files which we have no control over i.e. could be small or large. -- View this message in context: http:/

Incremantally updating a VERY LARGE field - Is this possibe ?

2012-04-03 Thread vybe3142
Some days ago, I posted about an issue with SOLR running out of memory when attempting to index large text files (say 300 MB ). Details at http://lucene.472066.n3.nabble.com/Solr-Tika-crashing-when-attempting-to-index-large-files-td3846939.html Two things I need to point out: 1. I don't need Ti

Thanks All, that worked (both via SOLRJ and the admin UI)

2012-04-02 Thread vybe3142
The query in question should be: -- View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-use-localparams-joins-using-SolrJ-and-or-the-Admin-GUI-tp3872088p3877927.html Sent from the Solr - User mailing list archive at Nabble.com.

How do I use localparams/joins using SolrJ and/or the Admin GUI

2012-03-30 Thread vybe3142
Here's a JOIN query using local params that I can sucessfully execute on a browser window: When I paste the relevant part of the query into the SOLR admin UI query interface, {!join+from=join_id+to=id}attributes_AUTHORS.4:4, I fail to retrieve any documents The query transaltes to : Serve

Re: Unload(true) doesn't delele Index file when unloading a core

2012-03-30 Thread vybe3142
Thanks, good to know. I'll program around this. -- View this message in context: http://lucene.472066.n3.nabble.com/Unload-true-doesn-t-delele-Index-file-when-unloading-a-core-tp3862816p3872022.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Unload(true) doesn't delele Index file when unloading a core

2012-03-28 Thread vybe3142
I'll try this again after restarting SOLR . -- View this message in context: http://lucene.472066.n3.nabble.com/Unload-true-doesn-t-delele-Index-file-when-unloading-a-core-tp3862816p3866259.html Sent from the Solr - User mailing list archive at Nabble.com.

Unload(true) doesn't delele Index file when unloading a core

2012-03-27 Thread vybe3142
>From what I understand, isn't the index file deletion an expected result? Thanks public int drop(, boolean removeIndex) ===> removeIndex passed in as true throws Exception { String coreName = . Unload req = new Unload(removeIndex); req.setCore

Solr / Tika crashing when attempting to index large files

2012-03-21 Thread vybe3142
While waiting for someohe to help answer my multicore config issue :),... I decided to test SOLR's limits on a single instance/core config. We occasionally need to index large text files (that must not be broken up). This results in an out of memory error. I tried increasing tomcat's heap size to

Re: org.apache.solr.common.SolrException: Internal Server Error

2012-03-21 Thread vybe3142
Try to obtain the server trace, That should tell you what specifically the error is -- View this message in context: http://lucene.472066.n3.nabble.com/org-apache-solr-common-SolrException-Internal-Server-Error-tp3842862p3846821.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Can't locate contrib jars (solr 3.5 / solr 4) when indexingt

2012-03-21 Thread vybe3142
My Latest attempt: With the dist and contrib directories as subdirectories of "solrhost". Now, tomcat seems to load the jars (see log below) but I still get the MimeTypeException when I try to index a file http://pastebin.com/VuB4yauP -- View this message in context:

Re: Can't locate contrib jars (solr 3.5 / solr 4) when indexingt

2012-03-21 Thread vybe3142
For the record etc doesn't work either -- View this message in context: http://lucene.472066.n3.nabble.com/Can-t-locate-contrib-jars-solr-3-5-solr-4-when-indexingt-tp3846464p3846472.html Sent from the Solr - User mailing list archive at Nabble.com.

Can't locate contrib jars (solr 3.5 / solr 4) when indexingt

2012-03-21 Thread vybe3142
I've had this issue trying to set up a multi core solr instance. It's obviously something basic I'm overlooking in the jar loading since the basic solr install works. The required classes should be in the jars located in the dist or contrib/*/lib dirs. SEVERE: null:java.lang.RuntimeException: java

Thanks All

2012-03-20 Thread vybe3142
Here is the core of the SOLRJ client that ended up accomplishing what I wanted String fileName2 = "C:\\work\\SolrClient\\data\\worldwartwo.txt"; SolrServer server = new StreamingUpdateSolrServer("http://localhost:8080/solr/",20,8); UpdateRequest req = new UpdateRequest("/up

Re: Is there a way for SOLR / SOLRJ to index files directly bypassing HTTP streaming?

2012-03-19 Thread vybe3142
BTW, .. using the client I pasted, I get the same error even with the standard supplied executable SOLR jar. -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-for-SOLR-SOLRJ-to-index-files-directly-bypassing-HTTP-streaming-tp3833419p3840483.html Sent from the Solr

Re: Is there a way for SOLR / SOLRJ to index files directly bypassing HTTP streaming?

2012-03-19 Thread vybe3142
Thanks for the response No, the file is plain text. All I'm trying to do is index plain ASCII text files via a remote reference to their file paths. I guess what I need to do is specify the content type as text. I don't think a "content-type" param will help since this behavior is tied to the

Re: Is there a way for SOLR / SOLRJ to index files directly bypassing HTTP streaming?

2012-03-19 Thread vybe3142
Okay, I added the javabin handler snippet to the solrconfig.xml file (actually shared across all cores). I got further (the request made it past tomcat and into SOLR) but haven't quite succeeded yet. Server trace: Mar 19, 2012 3:31:35 PM org.apache.solr.core.SolrCore execute INFO: [testcore1] we

Re: Is there a way for SOLR / SOLRJ to index files directly bypassing HTTP streaming?

2012-03-19 Thread vybe3142
Still No luck.Please help point out what I'm doing wrong. Neither the (commented out ) first approach (including the content with the request) nor the 2nd approach seem to work. Nothing seems to be acknowledged at the tomcat server either. I get the error: Starting SOLR doc indexing client 2 Exc

Re: Is there a way for SOLR / SOLRJ to index files directly bypassing HTTP streaming?

2012-03-18 Thread vybe3142
I'm going to try the approach described here and see what happens http://lucene.472066.n3.nabble.com/Fastest-way-to-use-solrj-td502659.html -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-for-SOLR-SOLRJ-to-index-files-directly-bypassing-HTTP-streaming-tp3833419

Re: Is there a way for SOLR / SOLRJ to index files directly bypassing HTTP streaming?

2012-03-18 Thread vybe3142
Thanks much. I plan to try this tomorrow. Can someone describe how to use remote streaming programmatically with solrj. For example, see the basic clients described here: http://androidyou.blogspot.com/2010/05/client-integration-with-solr-by-using.html and observe that the data is transferred in

Is there a way for SOLR / SOLRJ to index files directly bypassing HTTP streaming?

2012-03-16 Thread vybe3142
Hi, Is there a way for SOLR / SOLRJ to index files directly bypassing HTTP streaming. Use case: * Text Files to be indexed are on file server (A) (some potentially large - several 100 MB) * SOLRJ client is on server (B) * SOLR server is on server (C) running with dynamically created SOLR cores