GermanAnalyzer
Hi, I'm switching from Lucene 2.3 to Solr 3.5. I want to reuse the existing indexes (huge...). In Lucene I use an untweaked org.apache.lucene.analysis.de.GermanAnalyzer. What is an equivalent fieldType definition in Solr 3.5? Thank you
RE: GermanAnalyzer
> > What is an equivalent fieldType definition in Solr 3.5? > > > > OK, and if I would reindex, is this still the best practice config for german text?
SolrJ Embedded
Hi, is it possible to use the same index in a solr webapp and additionally in a EmbeddedSolrServer? The embbedded one would be read only. Thank you.
OR-FilterQuery
Hi, how efficent is such an query: q=some text fq=id:(1 OR 2 OR 3...) Should I better use q:some text AND id:(1 OR 2 OR 3...)? Is the Filter Cache used for the OR'ed fq? Thank you
RE: OR-FilterQuery
> > q=some text > > fq=id:(1 OR 2 OR 3...) > > > > Should I better use q:some text AND id:(1 OR 2 OR 3...)? > > > 1. These two opts have the different scoring. > 2. if you hit same fq=id:(1 OR 2 OR 3...) many times you have > a benefit due > to reading docset from heap instead of searching on disk. OK, understood. Thank you.
RE: OR-FilterQuery
> In other words, there's no attempt to decompose the fq clause > and store parts of it in the cache, it's exact-match or > nothing. Ah ok, thank you.
Client-side failover with SolrJ
Hi, has SolrJ any possiblities to do a failover from a master to a slave for searching? Thank you
ExtractingRequestHandler
Hi, I want to index various filetypes in solr, this can easily done with ExtractingRequestHandler. But I also need the extracted content back. I know ext.extract.only but then nothing gets indexed, right? Can I index the document AND get the content back as with ext.extract.only? In a single request? Thank you
RE: Client-side failover with SolrJ
> Did you try > http://lucene.apache.org/solr/api/org/apache/solr/client/solrj > /impl/LBHttpSolrServer.html? > This might be what you're looking for. Cool! Thx!
RE: Content privacy, search & index
> - Is it the best way to do that ? > - It's obvious that i need to index the registered users in > Solr (because an > user can search for others), but is it clever to index friend > list for each > user as well ? (if we take a look at the search box on > Facebook, or other > any sexy social network, they propose auto-complete for current user > friends, so maybe it makes sense...) This is a common question: How to merge the resultlist from solr (A) with a resultlist from elsewhere (B) (offen a RDBMS like in you case). 3 options: 1) do the merge in A: * fetch the ids from B and do the merge in A (e.g. filterQuery in Solr, be aware of maxBooleanClauses). 2) do the merge in B: * fetch the ids from A and do the merge in B (e.g. subselect, has limitation in big number of Ids too). 3) do the merge in the application (C): * fetch the ids from A and B and intersect them in C Depending on the size of the resultsets one of the 3 options is the best ;)
RE: ExtractingRequestHandler
Hi Erik, I think we have some misunderstanding. I want to index the text of the docs in Solr (only indexed, NOT stored). But I want the text (Tika output) back for: * later faster reindexing (some text extraction like OCR takes really long) * use the text for other processings The original doc is NOT stored in solr. So my question was if I can index the original doc via ExtractingRequestHandler in Solr AND get back the text output, in a single call. AFAIK I can do it only in 2 calls: 1) ExtractingRequestHandler?ext.extract.only=true -> Text 2) Index the text from 1) in solr Thx > Yes, you can. but Generally, storing the raw input in Solr is > not the best approach. The problem here is that pretty soon > you get a huge index that contains *everything*. Solr was not > intended to be a data store. > > Besides, you then need to store the binary form of the file. Solr > only deals with text, not markup. > > Most people index the text in Solr, and enough information > so the application knows where to go to fetch the original > document when the user drills down (e.g. file path, database > PK, etc). Would that work for your situation? > > Best > Erick > > On Sat, Mar 31, 2012 at 3:55 PM, wrote: > > Hi, > > > > I want to index various filetypes in solr, this can easily done with > > ExtractingRequestHandler. But I also need the extracted > content back. > > I know ext.extract.only but then nothing gets indexed, right? > > > > Can I index the document AND get the content back as with > ext.extract.only? > > In a single request? > > > > Thank you > > > > >
RE: ExtractingRequestHandler
> Solr Cell is great for proof-of-concept, but for heavy-duty > applications, > you're offloading all the processing on the Solr server, > which can be a > problem. Good point! Thank you
Wildcard-Search Solr 3.5.0
Hi, I have a tokenized text field with german content: The text may contain "FooBar". When I do a wildcard search like this: "Foo*" - no hits. When I do a wildcard search like this: "foo*" - doc is found. What's wrong here? Thank you
RE: Wildcard-Search Solr 3.5.0
Hi Ahmet, > Please see http://wiki.apache.org/solr/MultitermQueryAnalysis so your advice is to upgrade to 3.6? Thank you
RE: Wildcard-Search Solr 3.5.0
> > The text may contain "FooBar". > > > > When I do a wildcard search like this: "Foo*" - no hits. > > When I do a wildcard search like this: "foo*" - doc is > > found. > > Please see http://wiki.apache.org/solr/MultitermQueryAnalysis Well, it works in 3.6. With one exception: If I use german umlauts it does not work anymore. Text: Bär Bä* -> no hits Bär -> hits What can I do in this case? Thank you
RE: Wildcard-Search Solr 3.5.0
No one an idea? Thx. > > The text may contain "FooBar". > > > > When I do a wildcard search like this: "Foo*" - no hits. > > When I do a wildcard search like this: "foo*" - doc is > > found. > > Please see http://wiki.apache.org/solr/MultitermQueryAnalysis Well, it works in 3.6. With one exception: If I use german umlauts it does not work anymore. Text: Bär Bä* -> no hits Bär -> hits What can I do in this case? Thank you
RE: Wildcard-Search Solr 3.5.0
No. No hits for bä*. It's something with the umlauts but I have no idea what... > -Original Message- > From: Dmitry Kan [mailto:dmitry@gmail.com] > Sent: Mittwoch, 23. Mai 2012 13:36 > To: solr-user@lucene.apache.org > Subject: Re: Wildcard-Search Solr 3.5.0 > > what about bä*->hits? > > -- Dmitry > > On Wed, May 23, 2012 at 2:19 PM, wrote: > > > No one an idea? > > > > Thx. > > > > > > > > The text may contain "FooBar". > > > > > > > > When I do a wildcard search like this: "Foo*" - no hits. > > > > When I do a wildcard search like this: "foo*" - doc is > > > > found. > > > > > > Please see http://wiki.apache.org/solr/MultitermQueryAnalysis > > > > > > Well, it works in 3.6. With one exception: If I use german > umlauts it does > > not work anymore. > > > > Text: Bär > > > > Bä* -> no hits > > Bär -> hits > > > > What can I do in this case? > > > > Thank you > > > > > > > -- > Regards, > > Dmitry Kan >
RE: Wildcard-Search Solr 3.5.0
> -Original Message- > From: Dmitry Kan [mailto:dmitry@gmail.com] > Sent: Mittwoch, 23. Mai 2012 14:02 > To: solr-user@lucene.apache.org > Subject: Re: Wildcard-Search Solr 3.5.0 > > do umlauts arrive properly on the server side, no encoding > issues? Yes, works fine. It must, since I have hits for Bär or bär. It's just the combination between umlauts and wildcards. Must be something with the automagically Multiterm feature in Solr 3.6.
RE: Wildcard-Search Solr 3.5.0
> Maybe a filter like ISOLatin1AccentFilter that doesn't get > applied when > using wildcards? How do the terms actually appear in the index? Bär get indexed as bar. I use not ISOLatin1AccentFilter . My field def is this:
RE: Wildcard-Search Solr 3.5.0
> I'd guess that this is because SnowballPorterFilterFactory > does not implement MultiTermAwareComponent. Not sure, though. Yes, I think this hinders the automagically multiterm awarness to do it's job. Could an own analyzer chain with help? Like described (very, very short, too short...) here: http://wiki.apache.org/solr/MultitermQueryAnalysis
RE: Wildcard-Search Solr 3.5.0
Oh, thx for the update! I didn't noticed that solr 3.6 has a text_de field type. These two options... less / more aggressive. Aggressive in terms of what? Thank you! > -Original Message- > From: Jack Krupansky [mailto:j...@basetechnology.com] > Sent: Freitag, 25. Mai 2012 03:25 > To: solr-user@lucene.apache.org > Subject: Re: Wildcard-Search Solr 3.5.0 > > I tried it and it does appear to be the > SnowballPorterFilterFactory that > normally does the accent folding but can't here because it is > not multi-term > aware. I did notice that the text_de field type that comes in > the Solr 3.6 > example schema handles your case fine. It uses the > GermanNormalizationFilterFactory to fold accented characters and is > multi-term aware. Any particular reason you're not using the > stock text_de > field type? It also has three stemming options which might be > sufficient for > your needs. > > In any case, try to make your text_de field type closer to the stock > version, and try to use GermanNormalizationFilterFactory, and > that may be > good enough for your situation.
RE: Wildcard-Search Solr 3.5.0
> I don't know the specific rules in these specific stemmers, > but generally a > "less aggressive" stemming (e.g., "plural-only") of > "paintings" would be > "painting", while a "more aggressive" stemming would be > "paint". For some > "aggressive" stemmers the stemmed word is not even a word. Sounds logically :) > It would be nice to have doc with some example words for each stemmer. Absolutely! Thx alot!
ReadTimeout on commit
Hi, I'm indexing documents in batches of 100 docs. Then commit. Sometimes I get this exception: org.apache.solr.client.solrj.SolrServerException: java.net.SocketTimeoutException: Read timed out at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS olrServer.java:475) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS olrServer.java:249) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractU pdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:178) I found some similar postings in the web, all recommending autocommit. This is unfortunately not an option for me, because I have to know whether solr committed or not. What is causing this timeout? I'm using these settings in solrj: server.setSoTimeout(1000); server.setConnectionTimeout(100); server.setDefaultMaxConnectionsPerHost(100); server.setMaxTotalConnections(100); server.setFollowRedirects(false); server.setAllowCompression(true); server.setMaxRetries(1); Thank you
RE: ReadTimeout on commit
Hi Jack, hi Erik, thanks for the tips! It's solr 3.6 I increased the batch to 1000 docs and the timeout to 10 s. Now it works. And I will implement the retry around the commit-call. Thx! > -Original Message- > From: Jack Krupansky [mailto:j...@basetechnology.com] > Sent: Mittwoch, 6. Juni 2012 13:52 > To: solr-user@lucene.apache.org > Subject: Re: ReadTimeout on commit > > As Erick says, you are probably hitting an occasional > automatic background > merge which takes a bit longer. That is not an indication of > a problem. > Increase your connection timeout. Check the log to see how > long the merge or > "slow commit" takes. You have a timeout of 1000 which is 1 > second. Make it > longer, and possibly put the commit or other indexing > operations in a loop > with a few retries before considering connection timeout a > fatal error. > Occasional delays are a fact or life in a multi-process, networked > environment. > > -- Jack Krupansky > > -Original Message- > From: Erick Erickson > Sent: Wednesday, June 06, 2012 7:02 AM > To: solr-user@lucene.apache.org > Subject: Re: ReadTimeout on commit > > You're probably hitting a background merge and the request is timing > out even though the commit succeeds. Try querying for the data in > the last packet to test this. > > And you don't say what version of Solr you're using. > > One test you can do is increase the number of documents before > a commit. If merging is the problem I'd expect you to _still_ > encounter > this problem, just much less often. That would at least tell > you if this > is the right path to investigate. > > Best > Erick > > On Tue, Jun 5, 2012 at 6:51 AM, wrote: > > Hi, > > > > I'm indexing documents in batches of 100 docs. Then commit. > > > > Sometimes I get this exception: > > > > org.apache.solr.client.solrj.SolrServerException: > > java.net.SocketTimeoutException: Read timed out > >at > > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.reques > t(CommonsHttpS > > olrServer.java:475) > >at > > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.reques > t(CommonsHttpS > > olrServer.java:249) > >at > > > org.apache.solr.client.solrj.request.AbstractUpdateRequest.pro > cess(AbstractU > > pdateRequest.java:105) > >at > > org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:178) > > > > > > I found some similar postings in the web, all recommending > autocommit. > > This > > is unfortunately not an option for me, because I have to > know whether solr > > committed or not. > > > > What is causing this timeout? > > > > I'm using these settings in solrj: > > > >server.setSoTimeout(1000); > > server.setConnectionTimeout(100); > > server.setDefaultMaxConnectionsPerHost(100); > > server.setMaxTotalConnections(100); > > server.setFollowRedirects(false); > > server.setAllowCompression(true); > > server.setMaxRetries(1); > > > > Thank you > > >
Adding Custom-Parser to Tika
Hi, I have written a new parser for tika. The problem is, that I have to edit org.apache.tika.parser.Parser in the tika.jar. But I do not want to edit the jar. Is the another way to register the new parser? It must work with a plain AutoDetectParser, since this is used in oder Parsers directly (e.g. RFC822Parser). Thank you.
RE: Adding Custom-Parser to Tika
The parser must get registered in the service registry (META-INF/services/org.apache.tika.parser.Parser). Just being in the classpath does not work. > -Original Message- > From: Lance Norskog [mailto:goks...@gmail.com] > Sent: Freitag, 8. Juni 2012 22:38 > To: solr-user@lucene.apache.org > Subject: Re: Adding Custom-Parser to Tika > > Solr will find libs in top-level directory solr/lib (next to solr.xml) > or a lib/ directory inside each core directory. You can put your new > parser in a jar file in one of those places. Like this: > > solr/ > solr/solr.xml > solr/lib > solr/lib/yourjar.jar > solr/collection1 > solr/collection1/conf > solr/collection1/lib > solr/collection1/lib/yourjar.jar > > On Fri, Jun 8, 2012 at 12:35 PM, wrote: > > Hi, > > > > I have written a new parser for tika. The problem is, that > I have to edit > > org.apache.tika.parser.Parser in the tika.jar. But I do not > want to edit the > > jar. Is the another way to register the new parser? It must > work with a > > plain AutoDetectParser, since this is used in oder Parsers > directly (e.g. > > RFC822Parser). > > > > Thank you. > > > > > > -- > Lance Norskog > goks...@gmail.com >
RE: Adding Custom-Parser to Tika
> The doc is old. Tika hunts for parsers in the classpath now. > > http://www.lucidimagination.com/search/link?url=https://issues > .apache.org/jira/browse/SOLR-2116?focusedCommentId=12977072#ac > tion_12977072 "Re: tika-config.xml vs. META-INF/services/...; The service provider mechanism [1] makes it easy to add custom parser implementations without having to maintain a separate copy of the full Tika configuration file. You could for example create a my-custom-parsers.jar file with a META-INF/services/o.a.tika.parser.Parser file that lists only your custom parser classes. When you add that jar to the classpath, Tika would then automatically pick up those parsers in addition to the standard parser classes from the tika-parsers jar." This was exactly what I tried, but it did not work. I'm using Tika 1.1