from:"spring"

GermanAnalyzer

2012-01-14 Thread spring

Hi,

I'm switching from Lucene 2.3 to Solr 3.5. I want to reuse the existing
indexes (huge...).

In Lucene I use an untweaked org.apache.lucene.analysis.de.GermanAnalyzer.

What is an equivalent fieldType definition in Solr 3.5?

Thank you

RE: GermanAnalyzer

2012-01-15 Thread spring

> > What is an equivalent fieldType definition in Solr 3.5?
> 
> 
>   
> 

OK, and if I would reindex, is this still the best practice config for
german text?

SolrJ Embedded

2012-01-16 Thread spring

Hi,

is it possible to use the same index in a solr webapp and additionally in a
EmbeddedSolrServer? The embbedded one would be read only.

Thank you.

OR-FilterQuery

2012-02-13 Thread spring

Hi,

how efficent is such an query:

q=some text
fq=id:(1 OR 2 OR 3...)

Should I better use q:some text AND id:(1 OR 2 OR 3...)?

Is the Filter Cache used for the OR'ed fq?

Thank you

RE: OR-FilterQuery

2012-02-15 Thread spring

> > q=some text
> > fq=id:(1 OR 2 OR 3...)
> >
> > Should I better use q:some text AND id:(1 OR 2 OR 3...)?
> >
> 1. These two opts have the different scoring.
> 2. if you hit same fq=id:(1 OR 2 OR 3...) many times you have 
> a benefit due
> to reading docset from heap instead of searching on disk.

OK, understood.
Thank you.

RE: OR-FilterQuery

2012-02-15 Thread spring

> In other words, there's no attempt to decompose the fq clause
> and store parts of it in the cache, it's exact-match or
> nothing.

Ah ok, thank you.

Client-side failover with SolrJ

2012-03-26 Thread spring

Hi,

has SolrJ any possiblities to do a failover from a master to a slave for
searching?

Thank you

ExtractingRequestHandler

2012-03-31 Thread spring

Hi,

I want to index various filetypes in solr, this can easily done with
ExtractingRequestHandler. But I also need the extracted content back.
I know ext.extract.only but then nothing gets indexed, right?

Can I index the document AND get the content back as with ext.extract.only?
In a single request?

Thank you

RE: Client-side failover with SolrJ

2012-03-31 Thread spring

> Did you try 
> http://lucene.apache.org/solr/api/org/apache/solr/client/solrj
> /impl/LBHttpSolrServer.html?
>  This might be what you're looking for.

Cool!
Thx!

RE: Content privacy, search & index

2012-03-31 Thread spring

> - Is it the best way to do that ?
> - It's obvious that i need to index the registered users in 
> Solr (because an
> user can search for others), but is it clever to index friend 
> list for each
> user as well ? (if we take a look at the search box on 
> Facebook, or other
> any sexy social network, they propose auto-complete for current user
> friends, so maybe it makes sense...)

This is a common question:

How to merge the resultlist from solr (A) with a resultlist from elsewhere
(B) (offen a RDBMS like in you case).

3 options:

1) do the merge in A:

* fetch the ids from B and do the merge in A (e.g. filterQuery in Solr, be
aware of maxBooleanClauses).

2) do the merge in B:

* fetch the ids from A and do the merge in B (e.g. subselect, has limitation
in big number of Ids too).

3) do the merge in the application (C):

* fetch the ids from A and B and intersect them in C

Depending on the size of the resultsets one of the 3 options is the best ;)

RE: ExtractingRequestHandler

2012-04-01 Thread spring

Hi Erik,

I think we have some misunderstanding.

I want to index the text of the docs in Solr (only indexed, NOT stored).

But I want the text (Tika output) back for:

* later faster reindexing (some text extraction like OCR takes really long)
* use the text for other processings

The original doc is NOT stored in solr.


So my question was if I can index the original doc via
ExtractingRequestHandler in Solr AND get back the text output, in a single
call.

AFAIK I can do it only in 2 calls:

1) ExtractingRequestHandler?ext.extract.only=true -> Text
2) Index the text from 1) in solr


Thx 

> Yes, you can. but Generally, storing the raw input in Solr is
> not the best approach. The problem here is that pretty soon
> you get a huge index that contains *everything*. Solr was not
> intended to be a data store.
> 
> Besides, you then need to store the binary form of the file. Solr
> only deals with text, not markup.
> 
> Most people index the text in Solr, and enough information
> so the application knows where to go to fetch the original
> document when the user drills down (e.g. file path, database
> PK, etc). Would that work for your situation?
> 
> Best
> Erick
> 
> On Sat, Mar 31, 2012 at 3:55 PM,   wrote:
> > Hi,
> >
> > I want to index various filetypes in solr, this can easily done with
> > ExtractingRequestHandler. But I also need the extracted 
> content back.
> > I know ext.extract.only but then nothing gets indexed, right?
> >
> > Can I index the document AND get the content back as with 
> ext.extract.only?
> > In a single request?
> >
> > Thank you
> >
> >
>

RE: ExtractingRequestHandler

2012-04-02 Thread spring

>  Solr Cell is great for proof-of-concept, but for heavy-duty 
> applications,
> you're offloading all the processing on the  Solr server, 
> which can be a
> problem.

Good point!

Thank you

Wildcard-Search Solr 3.5.0

2012-05-20 Thread spring

Hi,

I have a tokenized text field with german content:


  
 





  
  





 
  

 

The text may contain "FooBar".

When I do a wildcard search like this: "Foo*" - no hits.
When I do a wildcard search like this: "foo*" - doc is found.

What's wrong here? 


Thank you

RE: Wildcard-Search Solr 3.5.0

2012-05-20 Thread spring

Hi Ahmet,

> Please see http://wiki.apache.org/solr/MultitermQueryAnalysis

so your advice is to upgrade to 3.6? 

Thank you

RE: Wildcard-Search Solr 3.5.0

2012-05-22 Thread spring

> > The text may contain "FooBar".
> > 
> > When I do a wildcard search like this: "Foo*" - no hits.
> > When I do a wildcard search like this: "foo*" - doc is
> > found.
> 
> Please see http://wiki.apache.org/solr/MultitermQueryAnalysis


Well, it works in 3.6. With one exception: If I use german umlauts it does
not work anymore.

Text: Bär

Bä* -> no hits
Bär -> hits

What can I do in this case?

Thank you

RE: Wildcard-Search Solr 3.5.0

2012-05-23 Thread spring

No one an idea?

Thx.


> > The text may contain "FooBar".
> > 
> > When I do a wildcard search like this: "Foo*" - no hits.
> > When I do a wildcard search like this: "foo*" - doc is
> > found.
> 
> Please see http://wiki.apache.org/solr/MultitermQueryAnalysis


Well, it works in 3.6. With one exception: If I use german umlauts it does
not work anymore.

Text: Bär

Bä* -> no hits
Bär -> hits

What can I do in this case?

Thank you

RE: Wildcard-Search Solr 3.5.0

2012-05-23 Thread spring

No. No hits for bä*.
It's something with the umlauts but I have no idea what...

> -Original Message-
> From: Dmitry Kan [mailto:dmitry@gmail.com] 
> Sent: Mittwoch, 23. Mai 2012 13:36
> To: solr-user@lucene.apache.org
> Subject: Re: Wildcard-Search Solr 3.5.0
> 
> what about bä*->hits?
> 
> -- Dmitry
> 
> On Wed, May 23, 2012 at 2:19 PM,  wrote:
> 
> > No one an idea?
> >
> > Thx.
> >
> >
> > > > The text may contain "FooBar".
> > > >
> > > > When I do a wildcard search like this: "Foo*" - no hits.
> > > > When I do a wildcard search like this: "foo*" - doc is
> > > > found.
> > >
> > > Please see http://wiki.apache.org/solr/MultitermQueryAnalysis
> >
> >
> > Well, it works in 3.6. With one exception: If I use german 
> umlauts it does
> > not work anymore.
> >
> > Text: Bär
> >
> > Bä* -> no hits
> > Bär -> hits
> >
> > What can I do in this case?
> >
> > Thank you
> >
> >
> 
> 
> -- 
> Regards,
> 
> Dmitry Kan
>

RE: Wildcard-Search Solr 3.5.0

2012-05-23 Thread spring

 

> -Original Message-
> From: Dmitry Kan [mailto:dmitry@gmail.com] 
> Sent: Mittwoch, 23. Mai 2012 14:02
> To: solr-user@lucene.apache.org
> Subject: Re: Wildcard-Search Solr 3.5.0
> 
> do umlauts arrive properly on the server side, no encoding 
> issues?

Yes, works fine.

It must, since I have hits for Bär or bär.
It's just the combination between umlauts and wildcards.
Must be something with the automagically Multiterm feature in Solr 3.6.

RE: Wildcard-Search Solr 3.5.0

2012-05-23 Thread spring

> Maybe a filter like ISOLatin1AccentFilter that doesn't get 
> applied when 
> using wildcards? How do the terms actually appear in the index?

Bär get indexed as bar.

I use not ISOLatin1AccentFilter . My field def is this:

RE: Wildcard-Search Solr 3.5.0

2012-05-23 Thread spring

> I'd guess that this is because SnowballPorterFilterFactory 
> does not implement MultiTermAwareComponent. Not sure, though.

Yes, I think this hinders the automagically multiterm awarness to do it's
job.
Could an own analyzer chain with  help? Like
described (very, very short, too short...) here:
http://wiki.apache.org/solr/MultitermQueryAnalysis

RE: Wildcard-Search Solr 3.5.0

2012-05-25 Thread spring

Oh, thx for the update! I didn't noticed that solr 3.6 has a text_de field
type. These two options... less / more aggressive. Aggressive in terms of
what?

Thank you!

> -Original Message-
> From: Jack Krupansky [mailto:j...@basetechnology.com] 
> Sent: Freitag, 25. Mai 2012 03:25
> To: solr-user@lucene.apache.org
> Subject: Re: Wildcard-Search Solr 3.5.0
> 
> I tried it and it does appear to be the 
> SnowballPorterFilterFactory that 
> normally does the accent folding but can't here because it is 
> not multi-term 
> aware. I did notice that the text_de field type that comes in 
> the Solr 3.6 
> example schema handles your case fine. It uses the 
> GermanNormalizationFilterFactory to fold accented characters and is 
> multi-term aware. Any particular reason you're not using the 
> stock text_de 
> field type? It also has three stemming options which might be 
> sufficient for 
> your needs.
> 
> In any case, try to make your text_de field type closer to the stock 
> version, and try to use GermanNormalizationFilterFactory, and 
> that may be 
> good enough for your situation.

RE: Wildcard-Search Solr 3.5.0

2012-05-25 Thread spring

> I don't know the specific rules in these specific stemmers, 
> but generally a 
> "less aggressive" stemming (e.g., "plural-only") of 
> "paintings" would be 
> "painting", while a "more aggressive" stemming would be 
> "paint". For some 
> "aggressive" stemmers the stemmed word is not even a word.

Sounds logically :)

> It would be nice to have doc with some example words for each stemmer.

Absolutely!

Thx alot!

ReadTimeout on commit

2012-06-05 Thread spring

Hi,

I'm indexing documents in batches of 100 docs. Then commit.

Sometimes I get this exception:

org.apache.solr.client.solrj.SolrServerException:
java.net.SocketTimeoutException: Read timed out
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS
olrServer.java:475)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS
olrServer.java:249)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractU
pdateRequest.java:105)
at
org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:178)


I found some similar postings in the web, all recommending autocommit. This
is unfortunately not an option for me, because I have to know whether solr
committed or not.

What is causing this timeout?

I'm using these settings in solrj:

server.setSoTimeout(1000);
  server.setConnectionTimeout(100);
  server.setDefaultMaxConnectionsPerHost(100);
  server.setMaxTotalConnections(100);
  server.setFollowRedirects(false);
  server.setAllowCompression(true);
  server.setMaxRetries(1);

Thank you

RE: ReadTimeout on commit

2012-06-06 Thread spring

Hi Jack, hi Erik,

thanks for the tips! It's solr 3.6

I increased the batch to 1000 docs and the timeout to 10 s. Now it works.
And I will implement the retry around the commit-call.

Thx!

> -Original Message-
> From: Jack Krupansky [mailto:j...@basetechnology.com] 
> Sent: Mittwoch, 6. Juni 2012 13:52
> To: solr-user@lucene.apache.org
> Subject: Re: ReadTimeout on commit
> 
> As Erick says, you are probably hitting an occasional 
> automatic background 
> merge which takes a bit longer. That is not an indication of 
> a problem. 
> Increase your connection timeout. Check the log to see how 
> long the merge or 
> "slow commit" takes. You have a timeout of 1000 which is 1 
> second. Make it 
> longer, and possibly put the commit or other indexing 
> operations in a loop 
> with a few retries before considering connection timeout a 
> fatal error. 
> Occasional delays are a fact or life in a multi-process, networked 
> environment.
> 
> -- Jack Krupansky
> 
> -Original Message- 
> From: Erick Erickson
> Sent: Wednesday, June 06, 2012 7:02 AM
> To: solr-user@lucene.apache.org
> Subject: Re: ReadTimeout on commit
> 
> You're probably hitting a background merge and the request is timing
> out even though the commit succeeds. Try querying for the data in
> the last packet to test this.
> 
> And you don't say what version of Solr you're using.
> 
> One test you can do is increase the number of documents before
> a commit. If merging is the problem I'd expect you to _still_ 
> encounter
> this problem, just much less often. That would at least tell 
> you if this
> is the right path to investigate.
> 
> Best
> Erick
> 
> On Tue, Jun 5, 2012 at 6:51 AM,   wrote:
> > Hi,
> >
> > I'm indexing documents in batches of 100 docs. Then commit.
> >
> > Sometimes I get this exception:
> >
> > org.apache.solr.client.solrj.SolrServerException:
> > java.net.SocketTimeoutException: Read timed out
> >at
> > 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.reques
> t(CommonsHttpS
> > olrServer.java:475)
> >at
> > 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.reques
> t(CommonsHttpS
> > olrServer.java:249)
> >at
> > 
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.pro
> cess(AbstractU
> > pdateRequest.java:105)
> >at
> > org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:178)
> >
> >
> > I found some similar postings in the web, all recommending 
> autocommit. 
> > This
> > is unfortunately not an option for me, because I have to 
> know whether solr
> > committed or not.
> >
> > What is causing this timeout?
> >
> > I'm using these settings in solrj:
> >
> >server.setSoTimeout(1000);
> >  server.setConnectionTimeout(100);
> >  server.setDefaultMaxConnectionsPerHost(100);
> >  server.setMaxTotalConnections(100);
> >  server.setFollowRedirects(false);
> >  server.setAllowCompression(true);
> >  server.setMaxRetries(1);
> >
> > Thank you
> > 
>

Adding Custom-Parser to Tika

2012-06-08 Thread spring

Hi,

I have written a new parser for tika. The problem is, that I have to edit
org.apache.tika.parser.Parser in the tika.jar. But I do not want to edit the
jar. Is the another way to register the new parser? It must work with a
plain AutoDetectParser, since this is used in oder Parsers directly (e.g.
RFC822Parser).

Thank you.

RE: Adding Custom-Parser to Tika

2012-06-08 Thread spring

The parser must get registered in the service registry
(META-INF/services/org.apache.tika.parser.Parser). Just being in the
classpath does not work. 

> -Original Message-
> From: Lance Norskog [mailto:goks...@gmail.com] 
> Sent: Freitag, 8. Juni 2012 22:38
> To: solr-user@lucene.apache.org
> Subject: Re: Adding Custom-Parser to Tika
> 
> Solr will find libs in top-level directory solr/lib (next to solr.xml)
> or a lib/ directory inside each core directory. You can put your new
> parser in a jar file in one of those places. Like this:
> 
> solr/
> solr/solr.xml
> solr/lib
> solr/lib/yourjar.jar
> solr/collection1
> solr/collection1/conf
> solr/collection1/lib
> solr/collection1/lib/yourjar.jar
> 
> On Fri, Jun 8, 2012 at 12:35 PM,   wrote:
> > Hi,
> >
> > I have written a new parser for tika. The problem is, that 
> I have to edit
> > org.apache.tika.parser.Parser in the tika.jar. But I do not 
> want to edit the
> > jar. Is the another way to register the new parser? It must 
> work with a
> > plain AutoDetectParser, since this is used in oder Parsers 
> directly (e.g.
> > RFC822Parser).
> >
> > Thank you.
> >
> 
> 
> 
> -- 
> Lance Norskog
> goks...@gmail.com
>

RE: Adding Custom-Parser to Tika

2012-06-09 Thread spring

> The doc is old. Tika hunts for parsers in the classpath now.
> 
> http://www.lucidimagination.com/search/link?url=https://issues
> .apache.org/jira/browse/SOLR-2116?focusedCommentId=12977072#ac
> tion_12977072

"Re: tika-config.xml vs. META-INF/services/...; The service provider
mechanism [1] makes it easy to add custom parser implementations without
having to maintain a separate copy of the full Tika configuration file. You
could for example create a my-custom-parsers.jar file with a
META-INF/services/o.a.tika.parser.Parser file that lists only your custom
parser classes. When you add that jar to the classpath, Tika would then
automatically pick up those parsers in addition to the standard parser
classes from the tika-parsers jar."

This was exactly what I tried, but it did not work.

I'm using Tika 1.1

GermanAnalyzer

RE: GermanAnalyzer

SolrJ Embedded

OR-FilterQuery

RE: OR-FilterQuery

RE: OR-FilterQuery

Client-side failover with SolrJ

ExtractingRequestHandler

RE: Client-side failover with SolrJ

RE: Content privacy, search & index

RE: ExtractingRequestHandler

RE: ExtractingRequestHandler

Wildcard-Search Solr 3.5.0

RE: Wildcard-Search Solr 3.5.0

RE: Wildcard-Search Solr 3.5.0

RE: Wildcard-Search Solr 3.5.0

RE: Wildcard-Search Solr 3.5.0

RE: Wildcard-Search Solr 3.5.0

RE: Wildcard-Search Solr 3.5.0

RE: Wildcard-Search Solr 3.5.0

RE: Wildcard-Search Solr 3.5.0

RE: Wildcard-Search Solr 3.5.0

ReadTimeout on commit

RE: ReadTimeout on commit

Adding Custom-Parser to Tika

RE: Adding Custom-Parser to Tika

RE: Adding Custom-Parser to Tika

27 matches

Site Navigation

Mail list logo

Footer information