RE: Adding Custom-Parser to Tika

2012-06-09 Thread spring
> The doc is old. Tika hunts for parsers in the classpath now. > > http://www.lucidimagination.com/search/link?url=https://issues > .apache.org/jira/browse/SOLR-2116?focusedCommentId=12977072#ac > tion_12977072 "Re: tika-config.xml vs. META-INF/services/...; The service provider mechanism [1] mak

RE: Adding Custom-Parser to Tika

2012-06-08 Thread spring
The parser must get registered in the service registry (META-INF/services/org.apache.tika.parser.Parser). Just being in the classpath does not work. > -Original Message- > From: Lance Norskog [mailto:goks...@gmail.com] > Sent: Freitag, 8. Juni 2012 22:38 > To: solr-user@lucene.apache.org

Adding Custom-Parser to Tika

2012-06-08 Thread spring
Hi, I have written a new parser for tika. The problem is, that I have to edit org.apache.tika.parser.Parser in the tika.jar. But I do not want to edit the jar. Is the another way to register the new parser? It must work with a plain AutoDetectParser, since this is used in oder Parsers directly (e.

RE: ReadTimeout on commit

2012-06-06 Thread spring
Hi Jack, hi Erik, thanks for the tips! It's solr 3.6 I increased the batch to 1000 docs and the timeout to 10 s. Now it works. And I will implement the retry around the commit-call. Thx! > -Original Message- > From: Jack Krupansky [mailto:j...@basetechnology.com] > Sent: Mittwoch, 6. J

ReadTimeout on commit

2012-06-05 Thread spring
Hi, I'm indexing documents in batches of 100 docs. Then commit. Sometimes I get this exception: org.apache.solr.client.solrj.SolrServerException: java.net.SocketTimeoutException: Read timed out at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS olrServer.java

RE: Wildcard-Search Solr 3.5.0

2012-05-25 Thread spring
> I don't know the specific rules in these specific stemmers, > but generally a > "less aggressive" stemming (e.g., "plural-only") of > "paintings" would be > "painting", while a "more aggressive" stemming would be > "paint". For some > "aggressive" stemmers the stemmed word is not even a wor

RE: Wildcard-Search Solr 3.5.0

2012-05-25 Thread spring
Oh, thx for the update! I didn't noticed that solr 3.6 has a text_de field type. These two options... less / more aggressive. Aggressive in terms of what? Thank you! > -Original Message- > From: Jack Krupansky [mailto:j...@basetechnology.com] > Sent: Freitag, 25. Mai 2012 03:25 > To: sol

RE: Wildcard-Search Solr 3.5.0

2012-05-23 Thread spring
> I'd guess that this is because SnowballPorterFilterFactory > does not implement MultiTermAwareComponent. Not sure, though. Yes, I think this hinders the automagically multiterm awarness to do it's job. Could an own analyzer chain with help? Like described (very, very short, too short...) here:

RE: Wildcard-Search Solr 3.5.0

2012-05-23 Thread spring
> Maybe a filter like ISOLatin1AccentFilter that doesn't get > applied when > using wildcards? How do the terms actually appear in the index? Bär get indexed as bar. I use not ISOLatin1AccentFilter . My field def is this:

RE: Wildcard-Search Solr 3.5.0

2012-05-23 Thread spring
> -Original Message- > From: Dmitry Kan [mailto:dmitry@gmail.com] > Sent: Mittwoch, 23. Mai 2012 14:02 > To: solr-user@lucene.apache.org > Subject: Re: Wildcard-Search Solr 3.5.0 > > do umlauts arrive properly on the server side, no encoding > issues? Yes, works fine. It must, s

RE: Wildcard-Search Solr 3.5.0

2012-05-23 Thread spring
No. No hits for bä*. It's something with the umlauts but I have no idea what... > -Original Message- > From: Dmitry Kan [mailto:dmitry@gmail.com] > Sent: Mittwoch, 23. Mai 2012 13:36 > To: solr-user@lucene.apache.org > Subject: Re: Wildcard-Search Solr 3.5.0 > > what about bä*->hits?

RE: Wildcard-Search Solr 3.5.0

2012-05-23 Thread spring
No one an idea? Thx. > > The text may contain "FooBar". > > > > When I do a wildcard search like this: "Foo*" - no hits. > > When I do a wildcard search like this: "foo*" - doc is > > found. > > Please see http://wiki.apache.org/solr/MultitermQueryAnalysis Well, it works in 3.6. With one

RE: Wildcard-Search Solr 3.5.0

2012-05-22 Thread spring
> > The text may contain "FooBar". > > > > When I do a wildcard search like this: "Foo*" - no hits. > > When I do a wildcard search like this: "foo*" - doc is > > found. > > Please see http://wiki.apache.org/solr/MultitermQueryAnalysis Well, it works in 3.6. With one exception: If I use german

RE: Wildcard-Search Solr 3.5.0

2012-05-20 Thread spring
Hi Ahmet, > Please see http://wiki.apache.org/solr/MultitermQueryAnalysis so your advice is to upgrade to 3.6? Thank you

Wildcard-Search Solr 3.5.0

2012-05-20 Thread spring
Hi, I have a tokenized text field with german content: The text may contain "FooBar". When I do a wildcard search like this: "Foo*" - no hits. When I

RE: ExtractingRequestHandler

2012-04-02 Thread spring
> Solr Cell is great for proof-of-concept, but for heavy-duty > applications, > you're offloading all the processing on the Solr server, > which can be a > problem. Good point! Thank you

RE: ExtractingRequestHandler

2012-04-01 Thread spring
Hi Erik, I think we have some misunderstanding. I want to index the text of the docs in Solr (only indexed, NOT stored). But I want the text (Tika output) back for: * later faster reindexing (some text extraction like OCR takes really long) * use the text for other processings The original doc

RE: Content privacy, search & index

2012-03-31 Thread spring
> - Is it the best way to do that ? > - It's obvious that i need to index the registered users in > Solr (because an > user can search for others), but is it clever to index friend > list for each > user as well ? (if we take a look at the search box on > Facebook, or other > any sexy social net

RE: Client-side failover with SolrJ

2012-03-31 Thread spring
> Did you try > http://lucene.apache.org/solr/api/org/apache/solr/client/solrj > /impl/LBHttpSolrServer.html? > This might be what you're looking for. Cool! Thx!

ExtractingRequestHandler

2012-03-31 Thread spring
Hi, I want to index various filetypes in solr, this can easily done with ExtractingRequestHandler. But I also need the extracted content back. I know ext.extract.only but then nothing gets indexed, right? Can I index the document AND get the content back as with ext.extract.only? In a single requ

Client-side failover with SolrJ

2012-03-26 Thread spring
Hi, has SolrJ any possiblities to do a failover from a master to a slave for searching? Thank you

RE: OR-FilterQuery

2012-02-15 Thread spring
> In other words, there's no attempt to decompose the fq clause > and store parts of it in the cache, it's exact-match or > nothing. Ah ok, thank you.

RE: OR-FilterQuery

2012-02-15 Thread spring
> > q=some text > > fq=id:(1 OR 2 OR 3...) > > > > Should I better use q:some text AND id:(1 OR 2 OR 3...)? > > > 1. These two opts have the different scoring. > 2. if you hit same fq=id:(1 OR 2 OR 3...) many times you have > a benefit due > to reading docset from heap instead of searching on disk

OR-FilterQuery

2012-02-13 Thread spring
Hi, how efficent is such an query: q=some text fq=id:(1 OR 2 OR 3...) Should I better use q:some text AND id:(1 OR 2 OR 3...)? Is the Filter Cache used for the OR'ed fq? Thank you

SolrJ Embedded

2012-01-16 Thread spring
Hi, is it possible to use the same index in a solr webapp and additionally in a EmbeddedSolrServer? The embbedded one would be read only. Thank you.

RE: GermanAnalyzer

2012-01-15 Thread spring
> > What is an equivalent fieldType definition in Solr 3.5? > > > > OK, and if I would reindex, is this still the best practice config for german text?

GermanAnalyzer

2012-01-14 Thread spring
Hi, I'm switching from Lucene 2.3 to Solr 3.5. I want to reuse the existing indexes (huge...). In Lucene I use an untweaked org.apache.lucene.analysis.de.GermanAnalyzer. What is an equivalent fieldType definition in Solr 3.5? Thank you