> The doc is old. Tika hunts for parsers in the classpath now.
>
> http://www.lucidimagination.com/search/link?url=https://issues
> .apache.org/jira/browse/SOLR-2116?focusedCommentId=12977072#ac
> tion_12977072
"Re: tika-config.xml vs. META-INF/services/...; The service provider
mechanism [1] mak
The parser must get registered in the service registry
(META-INF/services/org.apache.tika.parser.Parser). Just being in the
classpath does not work.
> -Original Message-
> From: Lance Norskog [mailto:goks...@gmail.com]
> Sent: Freitag, 8. Juni 2012 22:38
> To: solr-user@lucene.apache.org
Hi,
I have written a new parser for tika. The problem is, that I have to edit
org.apache.tika.parser.Parser in the tika.jar. But I do not want to edit the
jar. Is the another way to register the new parser? It must work with a
plain AutoDetectParser, since this is used in oder Parsers directly (e.
Hi Jack, hi Erik,
thanks for the tips! It's solr 3.6
I increased the batch to 1000 docs and the timeout to 10 s. Now it works.
And I will implement the retry around the commit-call.
Thx!
> -Original Message-
> From: Jack Krupansky [mailto:j...@basetechnology.com]
> Sent: Mittwoch, 6. J
Hi,
I'm indexing documents in batches of 100 docs. Then commit.
Sometimes I get this exception:
org.apache.solr.client.solrj.SolrServerException:
java.net.SocketTimeoutException: Read timed out
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS
olrServer.java
> I don't know the specific rules in these specific stemmers,
> but generally a
> "less aggressive" stemming (e.g., "plural-only") of
> "paintings" would be
> "painting", while a "more aggressive" stemming would be
> "paint". For some
> "aggressive" stemmers the stemmed word is not even a wor
Oh, thx for the update! I didn't noticed that solr 3.6 has a text_de field
type. These two options... less / more aggressive. Aggressive in terms of
what?
Thank you!
> -Original Message-
> From: Jack Krupansky [mailto:j...@basetechnology.com]
> Sent: Freitag, 25. Mai 2012 03:25
> To: sol
> I'd guess that this is because SnowballPorterFilterFactory
> does not implement MultiTermAwareComponent. Not sure, though.
Yes, I think this hinders the automagically multiterm awarness to do it's
job.
Could an own analyzer chain with help? Like
described (very, very short, too short...) here:
> Maybe a filter like ISOLatin1AccentFilter that doesn't get
> applied when
> using wildcards? How do the terms actually appear in the index?
Bär get indexed as bar.
I use not ISOLatin1AccentFilter . My field def is this:
> -Original Message-
> From: Dmitry Kan [mailto:dmitry@gmail.com]
> Sent: Mittwoch, 23. Mai 2012 14:02
> To: solr-user@lucene.apache.org
> Subject: Re: Wildcard-Search Solr 3.5.0
>
> do umlauts arrive properly on the server side, no encoding
> issues?
Yes, works fine.
It must, s
No. No hits for bä*.
It's something with the umlauts but I have no idea what...
> -Original Message-
> From: Dmitry Kan [mailto:dmitry@gmail.com]
> Sent: Mittwoch, 23. Mai 2012 13:36
> To: solr-user@lucene.apache.org
> Subject: Re: Wildcard-Search Solr 3.5.0
>
> what about bä*->hits?
No one an idea?
Thx.
> > The text may contain "FooBar".
> >
> > When I do a wildcard search like this: "Foo*" - no hits.
> > When I do a wildcard search like this: "foo*" - doc is
> > found.
>
> Please see http://wiki.apache.org/solr/MultitermQueryAnalysis
Well, it works in 3.6. With one
> > The text may contain "FooBar".
> >
> > When I do a wildcard search like this: "Foo*" - no hits.
> > When I do a wildcard search like this: "foo*" - doc is
> > found.
>
> Please see http://wiki.apache.org/solr/MultitermQueryAnalysis
Well, it works in 3.6. With one exception: If I use german
Hi Ahmet,
> Please see http://wiki.apache.org/solr/MultitermQueryAnalysis
so your advice is to upgrade to 3.6?
Thank you
Hi,
I have a tokenized text field with german content:
The text may contain "FooBar".
When I do a wildcard search like this: "Foo*" - no hits.
When I
> Solr Cell is great for proof-of-concept, but for heavy-duty
> applications,
> you're offloading all the processing on the Solr server,
> which can be a
> problem.
Good point!
Thank you
Hi Erik,
I think we have some misunderstanding.
I want to index the text of the docs in Solr (only indexed, NOT stored).
But I want the text (Tika output) back for:
* later faster reindexing (some text extraction like OCR takes really long)
* use the text for other processings
The original doc
> - Is it the best way to do that ?
> - It's obvious that i need to index the registered users in
> Solr (because an
> user can search for others), but is it clever to index friend
> list for each
> user as well ? (if we take a look at the search box on
> Facebook, or other
> any sexy social net
> Did you try
> http://lucene.apache.org/solr/api/org/apache/solr/client/solrj
> /impl/LBHttpSolrServer.html?
> This might be what you're looking for.
Cool!
Thx!
Hi,
I want to index various filetypes in solr, this can easily done with
ExtractingRequestHandler. But I also need the extracted content back.
I know ext.extract.only but then nothing gets indexed, right?
Can I index the document AND get the content back as with ext.extract.only?
In a single requ
Hi,
has SolrJ any possiblities to do a failover from a master to a slave for
searching?
Thank you
> In other words, there's no attempt to decompose the fq clause
> and store parts of it in the cache, it's exact-match or
> nothing.
Ah ok, thank you.
> > q=some text
> > fq=id:(1 OR 2 OR 3...)
> >
> > Should I better use q:some text AND id:(1 OR 2 OR 3...)?
> >
> 1. These two opts have the different scoring.
> 2. if you hit same fq=id:(1 OR 2 OR 3...) many times you have
> a benefit due
> to reading docset from heap instead of searching on disk
Hi,
how efficent is such an query:
q=some text
fq=id:(1 OR 2 OR 3...)
Should I better use q:some text AND id:(1 OR 2 OR 3...)?
Is the Filter Cache used for the OR'ed fq?
Thank you
Hi,
is it possible to use the same index in a solr webapp and additionally in a
EmbeddedSolrServer? The embbedded one would be read only.
Thank you.
> > What is an equivalent fieldType definition in Solr 3.5?
>
>
>
>
OK, and if I would reindex, is this still the best practice config for
german text?
Hi,
I'm switching from Lucene 2.3 to Solr 3.5. I want to reuse the existing
indexes (huge...).
In Lucene I use an untweaked org.apache.lucene.analysis.de.GermanAnalyzer.
What is an equivalent fieldType definition in Solr 3.5?
Thank you
27 matches
Mail list logo