Re: WordDelimiterFilter and the dot character

2012-10-17 Thread dirk
Hi,

I had a very similar Problem while searching in a bibliographic field called
"signatur". I could solve it by the help of additional Filterclasses. At the
moment I use the following Filters. Then it works for me:

...

 



   
 

...
The MappingCharFilterFactory I have added in order to have a better support
of german "Umlaute". Concerning the Wildcards: 
It is important that you use the ReversedWildcardFilterFactory only at index
time. All other Filters I also use at query time.
Perhaps it could help.
Dirk



-
erste Erfahrungen mit SOLR u. Vufind 
--
View this message in context: 
http://lucene.472066.n3.nabble.com/WordDelimiterFilter-and-the-dot-character-tp4014220p4014225.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to boost query term after tokenizer

2012-10-17 Thread dirk
Hi,

I think that`s not the way you can do it, because you cannot give a hint to
your analyzer, which text fragment is more relevant than another at runtime.
There is no marker so a filter process cannot know, which terms are to
boost. You could write your own filter and let it read a file with some
important terms in order to compare each term with your queryterms, but I
think that would not be a good way. 

If you have a way in order to split search query text into relevant terms.
The first step is done. That`s a possible way for analysis at query time in
order to search with right terms. In order to provide index data you can try
to pre-process your data in order to save most important keywords in
seperated search fields. Then you boost those fields on query time. 
Hope I could help, Dirk



-
erste Erfahrungen mit SOLR u. Vufind 
--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-boost-query-term-after-tokenizer-tp4010889p4014245.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Building an enterprise quality search engine using Apache Solr

2012-10-19 Thread dirk
Hi,
your question is not easy to answer. It depends on so many things, that
there is no standard way to realize an enterprise solution and time planning
aspects are depending on so much things. 

I can try to give you some brief notes about our solution, but there are
some differences in target group and data source. I am technical responsible
for the system disco (a research and discovery system) at the library at
university of Münster. (excuse me, I don't want to make a promotion tour
here, I earn no money with such activities -:)). Ok, in this search engine,
based on lucene, we search in about 200 Mio Articles, Books, Journals and so
on. So we have different data sources in structure and also in the way of
delivery. At the beginning we thought, lets buy a solution in order to avoid
more or less own developement work. So we bought a commercial search engine,
which works on a lucene core with a proprietary business logic in order to
talk to lucene core. So far so good - or not good. At that time I was the
onliest worker on this project and I need nearly one and a half year in
fulltime in order to fullfill most features and requirements. And the reason
for that long time is not, that I had no exiperiences, (I hope so). I work
in this area nearly 15 years in different companies, always as developer in
J2EE. (That`s rare today, because today every experienced developer wants to
work as "leader" or manager, that`s sounds better and less project leader
are outsourced. ok, other topic) And other universities (customers) who
realized a comparable search engine in that environment took as long or
longer. So I am hopefully...

In germany we say "der teufel steckt im detail" (translation literally:
devil is hidden in detail), which means you start work and parallel to that
process mostly requirements changed, sadly in most cases after development
has done the software basis. For example we need a lot of time for the fine
tuning of ranking and for realizing a complete automatic mechanism to update
data sources. And it was one thing to realize the search in development and
run a first developer test, a complete other thing is to make the system fit
for 24/7 service and run a productive system without problems.

Most time we need on data pre-processing because of the "shit in - shit out"
problem. Work on the quality of data is expensive but you get no
appreciation, because everybody is cope with searching features. This
requirement shows us, that mostly it is impossible to avoid own developement
completely. 
Next thing is user interface, not every feature a customer knows from good
old database backboned systems is easy to realized in a search engine
because of more or less flat data structure. So we had to develop one
service after the other in order to read additional informations. In our
case for example runtime holding informations of our library. 

Summarized, if you want to estimate a concrete time duration in order to
realize a complete productive enterprise search solution, you should talk to
some people with similar solutions, think of your own requirements in detail
and then multiply your estimation with 2. Then perhaps you have a realistic
estimate. 
Dirk   



-
my developer logs 
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Building-an-enterprise-quality-search-engine-using-Apache-Solr-tp4014557p4014688.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: "diversity" of search results?

2012-10-19 Thread dirk
Hi Paul,

yes that`s a typical problem in configuring a search engine. A solution
depends on your data. Sometimes you can overcome this problem by fine tuning
your search engine on boosting level. Thats not easy and always based on
trail and error tests.

Another thing you can do is to try to realize a data pre-processing which
compensate the reasons of similar content in certain fields, e.g. in a title
field. 
For example if you have products with very similar titles and you boost such
a field. The result is, that you always will found all documents in the
result list. But if you go on and add some informations (perhaps out of
other search fields) in this title field you perhaps can reduce the
similarity. (typical example in my branch: Book titles in different volumes,
then I add the volumn  number and der year to the title field.) 

Perhaps it is also necessary to cape with a pre-processed deduplication.
Here you can find an entry point:
http://wiki.apache.org/solr/Deduplication

Dirk

   



-
my developer logs 
--
View this message in context: 
http://lucene.472066.n3.nabble.com/diversity-of-search-results-tp4014692p4014696.html
Sent from the Solr - User mailing list archive at Nabble.com.


Extended Dismax Query Parser with AND as default operator

2015-06-18 Thread Dirk Buchhorn
Hello,

I have a question to the extended dismax query parser. If the default operator 
is changed to AND (q.op=AND) then the search results seems to be incorrect. I 
will explain it on some examples. For this test I use solr v5.1 and the tika 
core from the example directory.
== Preparation ==
Add the following lines to the schema.xml file
  
  id
Change the field "text" to stored="true"
Remove the multiValued attribute from the title and text field (we don't need 
multivaled fields in our test)

Add test data (use curl or fiddler)
Url:http://localhost:8983/solr/tika/update/json?commit=true
Header: Content-type: application/json
[
  {"id":"1", "title":"green", "author":"Jon", "text":"blue"},
  {"id":"2", "title":"green", "author":"Jon Jessie", "text":"red"},
  {"id":"3", "title":"yellow", "author":"Jessie", "text":"blue"},
  {"id":"4", "title":"green", "author":"Jessie", "text":"blue"},
  {"id":"5", "title":"blue", "author":"Jon", "text":"yellow"},
  {"id":"6", "title":"red", "author":"Jon", "text":"green"}
]

== Test ==
The following parameter are always set.
default operator is AND: q.op=AND
use the extended dismax query parser: defType=edismax
set the default query fields to title and text: qf=title text
sort: id asc

=== #1 test ===
q=red green
response:
{ "numFound":2,"start":0,
  "docs":[
{"id":"2","title":"green","author":"Jon Jessie","text":"red"},
{"id":"6","title":"red","author":"Jon","text":"green"}]
}
parsedquery_toString: "+(((text:green | title:green) (text:red | title:red))~2)"

This test works as expected.

=== #2 test ===
We use a group
q=(red green)
Same response as test one.
parsedquery_toString: "+(((text:green | title:green) (text:red | title:red))~2)"

This test works as expected.

=== #3 test ===
q=green red author:Jessie
response:
{ "numFound":1,"start":0,
  "docs":[{"id":"2","title":"green","author":"Jon Jessie","text":"red"}]
}
parsedquery_toString: "+(((text:green | title:green) (text:red | title:red) 
author:jessie)~3)"

This test works as expected.

=== #4 test ===
q=(green red) author:Jessie
response:
{ "numFound":2,"start":0,
  "docs":[
{"id":"2","title":"green","author":"Jon Jessie","text":"red"},
{"id":"4","title":"green","author":"Jessie","text":"blue"}]
}
parsedquery_toString: "+text:green | title:green) (text:red | title:red)) 
author:jessie)~2)"

The same result as the 3th test was expected. Why no AND is used for the query 
group?

=== #5 test ===
q=(+green +red) author:Jessie
response:
{ "numFound":4,"start":0,
  "docs":[
{"id":"2","title":"green","author":"Jon Jessie","text":"red"},
{"id":"3","title":"yellow","author":"Jessie","text":"blue"},
{"id":"4","title":"green","author":"Jessie","text":"blue"},
{"id":"6","title":"red","author":"Jon","text":"green"}]
}
parsedquery_toString: "+((+(text:green | title:green) +(text:red | title:red)) 
author:jessie)"

Now AND is used for the group but the author is concatenated with OR. Why?

=== #6 test ===
q=(+green +red) +author:Jessie
response:
{ "numFound":3,"start":0,
  "docs":[
{"id":"2","title":"green","author":"Jon Jessie","text":"red"},
{"id":"3","title":"yellow","author":"Jessie","text":"blue"},
{"id":"4","title":"green","author":"Jessie","text":"blue"}]
}
parsedquery_toString: "+((+(text:green | title:green) +(text:red | title:red)) 
+author:jessie)"

Still not the expected result.

=== #7 test ===
q=+(+green +red) +author:Jessie
response:
{ "numFound":1,"start":0,
  "docs":[{"id":"2","title":"green","author":"Jon Jessie","text":"red"}]
}
parsedquery_toString: "+(+(+(text:green | title:green) +(text:red | title:red)) 
+author:jessie)"

Now the result is ok. But if all operators must be given then q.op=AND is 
useless.

=== #8 test ===
q=green author:(Jon Jessie)
Found four results, expected are one. The query must changed to '+green 
+author:(+Jon +Jessie)' to get the expected result.

Is this a bug in the extended dismax parser or what is the reason for not 
consequently applying q.op=AND to the query expression?

Kind regards

Dirk Buchhorn


EarlyTerminatingCollectorException

2014-11-05 Thread Dirk Högemann
Our production Solr-Slaves-Cores (we have about 40 Cores (each has a
moderate size about 10K documents to  90K documents)) produce many
exceptions of type:

014-11-05 15:06:06.247 [searcherExecutor-158-thread-1] ERROR
org.apache.solr.search.SolrCache: Error during auto-warming of
key:org.apache.solr.search.QueryResultKey@62340b01
:org.apache.solr.search.EarlyTerminatingCollectorException

Our relevant solrconfig is

  

  18

  

  
2


   


  

  

What exactly does the exception mean?
Thank you!

-- Dirk --


Re: EarlyTerminatingCollectorException

2014-11-06 Thread Dirk Högemann
https://issues.apache.org/jira/browse/SOLR-6710

2014-11-05 21:56 GMT+01:00 Mikhail Khludnev :

> I'm wondered too, but it seems it warmups queryResultCache
>
> https://github.com/apache/lucene-solr/blob/20f9303f5e2378e2238a5381291414881ddb8172/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L522
> at least this ERRORs broke nothing  see
>
> https://github.com/apache/lucene-solr/blob/20f9303f5e2378e2238a5381291414881ddb8172/solr/core/src/java/org/apache/solr/search/FastLRUCache.java#L165
>
> anyway, here are two usability issues:
>  - of key:org.apache.solr.search.QueryResultKey@62340b01 lack of readable
> toString()
>  - I don't think regeneration exceptions are ERRORs, they seem WARNs for me
> or even lower. also for courtesy, particularly
> EarlyTerminatingCollectorExcepions can be recognized, and even ignored,
> providing SolrIndexSearcher.java#L522
>
> Would you mind to raise a ticket?
>
> On Wed, Nov 5, 2014 at 6:51 PM, Dirk Högemann  wrote:
>
> > Our production Solr-Slaves-Cores (we have about 40 Cores (each has a
> > moderate size about 10K documents to  90K documents)) produce many
> > exceptions of type:
> >
> > 014-11-05 15:06:06.247 [searcherExecutor-158-thread-1] ERROR
> > org.apache.solr.search.SolrCache: Error during auto-warming of
> > key:org.apache.solr.search.QueryResultKey@62340b01
> > :org.apache.solr.search.EarlyTerminatingCollectorException
> >
> > Our relevant solrconfig is
> >
> >   
> > 
> >   18
> > 
> >   
> >
> >   
> > 2
> >  >   class="solr.FastLRUCache"
> >   size="8192"
> >   initialSize="8192"
> >   autowarmCount="4096"/>
> >
> >
> >  >   class="solr.FastLRUCache"
> >   size="8192"
> >   initialSize="8192"
> >   autowarmCount="4096"/>
> >
> >   
> >  >   class="solr.FastLRUCache"
> >   size="8192"
> >   initialSize="8192"
> >   autowarmCount="4096"/>
> >   
> >
> > What exactly does the exception mean?
> > Thank you!
> >
> > -- Dirk --
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> 
>


Solr4.2 PostCommit EventListener not working on Replication-Instances

2013-07-25 Thread Dirk Högemann
Hello,

I have implemented a Solr EventListener, which should be fired after
committing.
This works fine on the Solr-Master Instance and  it also worked in Solr 3.5
on any Slave Instance.
I upgraded my installation to Solr 4.2 and now the postCommit event is not
fired any more on the replication (Slave) instances, which is a huge
problem, as other cache have to be invalidated, when replication took place.

This is my configuration solrconfig.xml on the slaves:

  

  1



...


  

...
  

  http://localhost:9101/solr/Core1
  00:03:00

  

Any hints?

Best regards


Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at whitespace?

2012-12-17 Thread Dirk Högemann
Hi,

I am not sure if am missing something, or maybe I do not exactly understand
the index/search analyzer definition and their execution.

I have a field definition like this:



  


  
  


  


Any field starting with cl2 should be recognized as being of type
cl2Tokenized_string:


When I try to search for a token in that sense the query is tokenized at
whitespaces:

{!q.op=AND
df=cl2Categories_NACE}cl2Categories_NACE:08 Gewinnung von Steinen und
Erden, sonstiger Bergbau+cl2Categories_NACE:08
+cl2Categories_NACE:gewinnung +cl2Categories_NACE:von
+cl2Categories_NACE:steinen +cl2Categories_NACE:und
+cl2Categories_NACE:erden, +cl2Categories_NACE:sonstiger
+cl2Categories_NACE:bergbau

I expected the query parser would also tokenize ONLY at the pattern ###,
instead of using a white space tokenizer here?
Is is possible to define a filter query, without using phrases, to achieve
the desired behavior?
Maybe local parameters are not the way to go here?

Best
Dirk


Re: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at whitespace?

2012-12-17 Thread Dirk Högemann
{!q.op=AND df=cl2Categories_NACE}08
Gewinnung von Steinen und Erden, sonstiger Bergbau+cl2Categories_NACE:08
+cl2Categories_NACE:gewinnung +cl2Categories_NACE:von
+cl2Categories_NACE:steinen +cl2Categories_NACE:und
+cl2Categories_NACE:erden, +cl2Categories_NACE:sonstiger
+cl2Categories_NACE:bergbau

That is the relevant debug Output from the query.

2012/12/17 Dirk Högemann 

> Hi,
>
> I am not sure if am missing something, or maybe I do not exactly
> understand the index/search analyzer definition and their execution.
>
> I have a field definition like this:
>
>
>  sortMissingLast="true" omitNorms="true">
>   
>  group="-1"/>
> 
>   
>   
>  group="-1"/>
> 
>   
> 
>
> Any field starting with cl2 should be recognized as being of type
> cl2Tokenized_string:
>  stored="true" />
>
> When I try to search for a token in that sense the query is tokenized at
> whitespaces:
>
> {!q.op=AND
> df=cl2Categories_NACE}cl2Categories_NACE:08 Gewinnung von Steinen und
> Erden, sonstiger Bergbau name="parsed_filter_queries">+cl2Categories_NACE:08
> +cl2Categories_NACE:gewinnung +cl2Categories_NACE:von
> +cl2Categories_NACE:steinen +cl2Categories_NACE:und
> +cl2Categories_NACE:erden, +cl2Categories_NACE:sonstiger
> +cl2Categories_NACE:bergbau
>
> I expected the query parser would also tokenize ONLY at the pattern ###,
> instead of using a white space tokenizer here?
> Is is possible to define a filter query, without using phrases, to achieve
> the desired behavior?
> Maybe local parameters are not the way to go here?
>
> Best
> Dirk
>


Re: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at whitespace?

2012-12-17 Thread Dirk Högemann
Ok- right, changed that... Nevertheless I thought I should always use the
same analyzers for the query and the index section to have consistent
results.
Does this mean that the tokenizer in the query section will always be
ignored by the given query parsers?



2012/12/17 Jack Krupansky 

> The query parsers normally tokenize on white space and query operators,
> but you can escape any white space with backslash or put the text in quotes
> and then it will be tokenized by the analyzer rather than the query parser.
>
> Also, you have:
>
> 
>
> Change "search" to "query", but that won't change your problem since Solr
> defaults to using the "index" analyzer if it doesn't "see" a "query"
> analyzer.
>
> -- Jack Krupansky
>
> -Original Message- From: Dirk Högemann
> Sent: Monday, December 17, 2012 5:59 AM
> To: solr-user@lucene.apache.org
> Subject: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at
> whitespace?
>
>
> Hi,
>
> I am not sure if am missing something, or maybe I do not exactly understand
> the index/search analyzer definition and their execution.
>
> I have a field definition like this:
>
>
> sortMissingLast="true" omitNorms="true">
>  
> group="-1"/>
>
>  
>  
> group="-1"/>
>
>  
>
>
> Any field starting with cl2 should be recognized as being of type
> cl2Tokenized_string:
>  stored="true" />
>
> When I try to search for a token in that sense the query is tokenized at
> whitespaces:
>
> {!**q.op=AND
> df=cl2Categories_NACE}**cl2Categories_NACE:08 Gewinnung von Steinen und
> Erden, sonstiger Bergbau name="parsed_filter_queries"><**str>+cl2Categories_NACE:08
> +cl2Categories_NACE:gewinnung +cl2Categories_NACE:von
> +cl2Categories_NACE:steinen +cl2Categories_NACE:und
> +cl2Categories_NACE:erden, +cl2Categories_NACE:sonstiger
> +cl2Categories_NACE:bergbau
>
> I expected the query parser would also tokenize ONLY at the pattern ###,
> instead of using a white space tokenizer here?
> Is is possible to define a filter query, without using phrases, to achieve
> the desired behavior?
> Maybe local parameters are not the way to go here?
>
> Best
> Dirk
>


Re: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at whitespace?

2012-12-17 Thread Dirk Högemann
Ah - now I got it. My solution to this was to use phrase queries - now I
know why: Thanks!
2012/12/17 Jack Krupansky 

> No, the "query" analyzer tokenizer will simply be applied to each term or
> quoted string AFTER the query parser has already parsed it. You may have
> escaped or quoted characters which will then be seen by the analyzer
> tokenizer.
>
>
> -- Jack Krupansky
>
> -Original Message- From: Dirk Högemann
> Sent: Monday, December 17, 2012 11:01 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always
> at whitespace?
>
>
> Ok- right, changed that... Nevertheless I thought I should always use the
> same analyzers for the query and the index section to have consistent
> results.
> Does this mean that the tokenizer in the query section will always be
> ignored by the given query parsers?
>
>
>
> 2012/12/17 Jack Krupansky 
>
>  The query parsers normally tokenize on white space and query operators,
>> but you can escape any white space with backslash or put the text in
>> quotes
>> and then it will be tokenized by the analyzer rather than the query
>> parser.
>>
>> Also, you have:
>>
>> 
>>
>> Change "search" to "query", but that won't change your problem since Solr
>> defaults to using the "index" analyzer if it doesn't "see" a "query"
>> analyzer.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Dirk Högemann
>> Sent: Monday, December 17, 2012 5:59 AM
>> To: solr-user@lucene.apache.org
>> Subject: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at
>> whitespace?
>>
>>
>> Hi,
>>
>> I am not sure if am missing something, or maybe I do not exactly
>> understand
>> the index/search analyzer definition and their execution.
>>
>> I have a field definition like this:
>>
>>
>>> sortMissingLast="true" omitNorms="true">
>>  
>>> group="-1"/>
>>
>>  
>>  
>>> group="-1"/>
>>
>>
>>  
>>
>>
>> Any field starting with cl2 should be recognized as being of type
>> cl2Tokenized_string:
>> > stored="true" />
>>
>> When I try to search for a token in that sense the query is tokenized at
>> whitespaces:
>>
>> {!q.op=AND
>> df=cl2Categories_NACE}cl2Categories_NACE:08 Gewinnung von Steinen
>> und
>>
>> Erden, sonstiger Bergbau> name="parsed_filter_queries"><str>+cl2Categories_NACE:08
>>
>> +cl2Categories_NACE:gewinnung +cl2Categories_NACE:von
>> +cl2Categories_NACE:steinen +cl2Categories_NACE:und
>> +cl2Categories_NACE:erden, +cl2Categories_NACE:sonstiger
>> +cl2Categories_NACE:bergbau
>>
>>
>> I expected the query parser would also tokenize ONLY at the pattern ###,
>> instead of using a white space tokenizer here?
>> Is is possible to define a filter query, without using phrases, to achieve
>> the desired behavior?
>> Maybe local parameters are not the way to go here?
>>
>> Best
>> Dirk
>>
>>
>


Re: Bad performance while query pdf solr documents

2012-12-23 Thread Dirk Högemann
You can define the fields to be returned with the fl parameter fl=the,
needed, fields - usually the score and the id...

2012/12/23 uwe72 

> hi
>
> i am indexing pdf documents to solr by tika.
>
> when i do the query in the client with solrj the performance is very bad
> (40
> seconds) to load 100 documents?
>
> Probably because to load all the content. The content i don't need. How can
> i tell the query to don't load the content?
>
> Or other reasons why the performance is so bad?
>
> Regards
> Uwe
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Bad-performance-while-query-pdf-solr-documents-tp4028766.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Bad performance while query pdf solr documents

2012-12-23 Thread Dirk Högemann
Do you really need them all in the response to show them in the results?
As you define them as not stored now this does not seem so.


2012/12/23 Otis Gospodnetic 

> Hi,
>
> You can specify them in solrconfig.xml for your request handler, so you
> don't have to specify it for each query unless you want to override fl.
>
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
> On Dec 23, 2012 4:39 AM, "uwe72"  wrote:
>
> > we have more than hundreds fields...i don't want to put them all to the
> fl
> > parameters
> >
> > is there a other way, like to say return all fields, except the
> fields...?
> >
> > anyhow i will change the field from stored to stored=false in the schema.
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Bad-performance-while-query-pdf-solr-documents-tp4028766p4028816.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>


Highlighting problems

2013-03-11 Thread Dirk Wintergruen
Hi all,

I have problems with the higlighting mechanism:

The query is:

http://127.0.0.1:8983/solr/mpiwgweb/select?facet=true&facet.field=description&facet.field=lang&facet.field=main_content&start=0&q=meier+AND+%28description:member+OR+description:project%29


after that:

In the field "main_content" which is the default search field. 

"meier" as well  as as "member" and "project" is highlighted, although im 
searching for member and project only in the field description.

The search results are ok, as far as I can see.


my settings 

 

 
   explicit
   10
300
on
   main_content
   html
   
   200
   2
   true
     
 
tvComponent

  



Cheers
Dirk



Re: AW: Highlighting problems

2013-03-11 Thread Dirk Wintergruen
Hi Andre,
thanks this did the job. I also had to enable edismax and set the default 
parameter there - otherwise no highlighting at all.

Best
Dirk

Am 11.03.2013 um 13:59 schrieb André Widhani :

> Hi Dirk,
> 
> please check 
> http://wiki.apache.org/solr/HighlightingParameters#hl.requireFieldMatch - 
> this may help you.
> 
> Regards,
> André
> 
> ____
> Von: Dirk Wintergruen [dwin...@mpiwg-berlin.mpg.de]
> Gesendet: Montag, 11. März 2013 13:56
> An: solr-user@lucene.apache.org
> Betreff: Highlighting problems
> 
> Hi all,
> 
> I have problems with the higlighting mechanism:
> 
> The query is:
> 
> http://127.0.0.1:8983/solr/mpiwgweb/select?facet=true&facet.field=description&facet.field=lang&facet.field=main_content&start=0&q=meier+AND+%28description:member+OR+description:project%29
> 
> 
> after that:
> 
> In the field "main_content" which is the default search field.
> 
> "meier" as well  as as "member" and "project" is highlighted, although im 
> searching for member and project only in the field description.
> 
> The search results are ok, as far as I can see.
> 
> 
> my settings
> 
> 
>
> 
>   explicit
>   10
>300
>on
>   main_content
>   html
>   
>   200
>   2
>   true
> 
> 
>tvComponent
>
> 
> 
> 
> 
> Cheers
> Dirk
> 


Phonetic search and matching

2012-02-06 Thread Dirk Högemann
Hi,

I have a question on phonetic search and matching in solr.
In our application all the content of an article is written to a full-text
search field, which provides stemming and a phonetic filter (cologne
phonetic for german).
This is the relevant part of the configuration for the index analyzer
(search is analogous):








Unfortunately this results sometimes in strange, but also explainable,
matches.
For example:

Content field indexes the following String: Donnerstag von 13 bis 17 Uhr.

This results in a match, if we search for "puf"  as the result of the
phonetic filter for this is 13.
(As a consequence the 13 is then also highlighted)

Does anyone has an idea how to handle this in a reasonable way that a
search for "puf" does not match 13 in the content?

Thanks in advance!

Dirk


Re: Phonetic search and matching

2012-02-07 Thread Dirk Högemann
Thanks Erick.
In the first place we thought of removing numbers with a pattern filter.
Setting inject to false will have the "same" effect
If we want to be able to search for numbers in the content this solution
will not work,but another field without phonetic filtering and searching in
both fields would be ok,right?

Dirk
Am 07.02.2012 14:01 schrieb "Erick Erickson" :

> What happens if you do NOT inject? Setting  inject="false"
> stores only the phonetic reduction, not the original text. In that
> case your false match on "13" would go away
>
> Not sure what that means for the rest of your app though.
>
> Best
> Erick
>
> On Mon, Feb 6, 2012 at 5:44 AM, Dirk Högemann
>  wrote:
> > Hi,
> >
> > I have a question on phonetic search and matching in solr.
> > In our application all the content of an article is written to a
> full-text
> > search field, which provides stemming and a phonetic filter (cologne
> > phonetic for german).
> > This is the relevant part of the configuration for the index analyzer
> > (search is analogous):
> >
> >
> > > generateWordParts="1" generateNumberParts="1" catenateWords="0"
> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
> >
> > language="German2"
> > />
> > > encoder="ColognePhonetic" inject="true"/>
> >
> >
> > Unfortunately this results sometimes in strange, but also explainable,
> > matches.
> > For example:
> >
> > Content field indexes the following String: Donnerstag von 13 bis 17 Uhr.
> >
> > This results in a match, if we search for "puf"  as the result of the
> > phonetic filter for this is 13.
> > (As a consequence the 13 is then also highlighted)
> >
> > Does anyone has an idea how to handle this in a reasonable way that a
> > search for "puf" does not match 13 in the content?
> >
> > Thanks in advance!
> >
> > Dirk
>


Solr / Tika Integration

2012-02-10 Thread Dirk Högemann
Hello,

we use Solr 3.5 and Tika to index a lot of PDFs. The content of those PDFs
is searchable via a full-text search.
Also the terms are used to make search suggestions.

Unfortunately pdfbox seems to insert a space character, when there are
soft-hyphens in the content of the PDF
Thus the extracted text is sometimes very fragmented. For example the word
Medizin is extracted as Me di zin.
As a consequence the suggestions are often unusable and the search does not
work as expected.

Has anyone a suggestion how to extract the content of PDF containing
sof-hyphens withpout fragmenting it?

Best
Dirk


Re: Solr / Tika Integration

2012-02-10 Thread Dirk Högemann
Thanks so far. I will have a closer look at the PDF.

I tried the enableautospace setting with pdfbox1.6 - did not work:

PDFParser parser = new PDFParser();
   parser.setEnableAutoSpace(false);
   ContentHandler handler = new BodyContentHandler();

Output:
Va ri an te Creutz feldt-
Ja kob-Krank heit
Stel lung nah men des Ar beits krei ses Blut

Our suggest component and parts of our search is getting hard to use by
this. Any other ideas?

Best
Dirk


2012/2/10 Jan Høydahl 

> I think you need to control the parameter "enableAutoSpace" in PDFBox.
> There's a JIRA for it, but it depends on some Tika1.1 stuff as far I can
> understand
>
> https://issues.apache.org/jira/browse/SOLR-2930
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 10. feb. 2012, at 11:21, Dirk Högemann wrote:
>
> > Hello,
> >
> > we use Solr 3.5 and Tika to index a lot of PDFs. The content of those
> PDFs
> > is searchable via a full-text search.
> > Also the terms are used to make search suggestions.
> >
> > Unfortunately pdfbox seems to insert a space character, when there are
> > soft-hyphens in the content of the PDF
> > Thus the extracted text is sometimes very fragmented. For example the
> word
> > Medizin is extracted as Me di zin.
> > As a consequence the suggestions are often unusable and the search does
> not
> > work as expected.
> >
> > Has anyone a suggestion how to extract the content of PDF containing
> > sof-hyphens withpout fragmenting it?
> >
> > Best
> > Dirk
>
>


Re: Solr / Tika Integration

2012-02-10 Thread Dirk Högemann
Interesting thing is that the only Tool I found to handle my pdf correctly
was pdftotext.


2012/2/10 Robert Muir 

> On Fri, Feb 10, 2012 at 6:18 AM, Dirk Högemann
>  wrote:
> >
> > Our suggest component and parts of our search is getting hard to use by
> > this. Any other ideas?
> >
>
> Looks like https://issues.apache.org/jira/browse/PDFBOX-371
>
> The title of the issue is a bit confusing (I don't think it should go
> to hyphen either!), but I think its the reason its being mapped to a
> space.
>
> --
> lucidimagination.com
>


Auto-Commit and failures / schema violations

2011-07-29 Thread Dirk Högemann
Hello,

we are running a large CMS with multiple customers and we are now going to use 
solr for our search and indexing tasks.
As we have a lot of users working simultaneously on the CMS we decided not to 
commit our changes programatically (we use StreamingUpdateSolrServer) on each 
add. Instead we are using the autocommit functions ins solr-config.xml.

To be "reliable" we write Timestamp files on each "add" of a document to the 
StreamingUpdateSolrServer. (In case of a crash we could restart indexing since 
that timetamp. )
Unfortunately we don't know how to be sure that the add was successfull, as 
(for example) schema violations seem to be detected on commit, which is 
therefore too late, as the timestamp is usually already overwritten then.

So: Are there any valid approaches to bes sure that an add of a document has 
been processed successfully?
Maybe: Is ist better to collect a list of documents to add and commit these, 
instead of using the auto-commit function?

Thanks in advance for any help!
Dirk Högemann
___
Schon gehört? WEB.DE hat einen genialen Phishing-Filter in die
Toolbar eingebaut! http://produkte.web.de/go/toolbar


Auto commit exception in Solr 4.0 Beta

2012-08-21 Thread Dirk Högemann
Hello,

I am trying to make our search application Solr 4.0 (Beta) ready and
elaborate on the tasks necessary to accomplish this.
When I try to reindex our documents I get the following exception:

 auto commit error...:java.lang.UnsupportedOperationException: this codec
can only be used for reading
at
org.apache.lucene.codecs.lucene3x.Lucene3xCodec$1.writeLiveDocs(Lucene3xCodec.java:74)
at
org.apache.lucene.index.ReadersAndLiveDocs.writeLiveDocs(ReadersAndLiveDocs.java:278)
at
org.apache.lucene.index.IndexWriter$ReaderPool.release(IndexWriter.java:435)
at
org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:278)
at
org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2928)
at
org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2919)
at
org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2666)
at
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2793)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2773)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:531)
at org.apache.solr.update.CommitTracker.run(CommitTracker.java:214)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

Is this a known bug, or is it maybe a Classpath problem I am facing here?

Best
Dirk Hoegemann


Re: Auto commit exception in Solr 4.0 Beta

2012-08-21 Thread Dirk Högemann
Perfect. I reindexed the whole index and everything worked fine. The
exception was just a little bit confusing.
Best
Dirk
Am 21.08.2012 14:39 schrieb "Jack Krupansky" :

> Did you explicitly run the IndexUpgrader before adding new documents?
>
> In theory, you don't have to do that, but... who knows for sure.
>
> While you wait for one of the hard-core Lucene guys to respond, you could
> try IndexUpgrader, if you haven't already.
>
> OTOH, if you are in fact reindexing (rather than reusing your old index),
> why not start with an empty 4.0 index?
>
> From CHANGES.TXT:
>
> - On upgrading to 4.0, if you do not fully reindex your documents,
>  Lucene will emulate the new flex API on top of the old index,
>  incurring some performance cost (up to ~10% slowdown, typically).
>  To prevent this slowdown, use oal.index.IndexUpgrader
>  to upgrade your indexes to latest file format (LUCENE-3082).
>
>  Mixed flex/pre-flex indexes are perfectly fine -- the two
>  emulation layers (flex API on pre-flex index, and pre-flex API on
>  flex index) will remap the access as required.  So on upgrading to
>  4.0 you can start indexing new documents into an existing index.
>  To get optimal performance, use oal.index.IndexUpgrader
>  to upgrade your indexes to latest file format (LUCENE-3082).
>
> -- Jack Krupansky
>
> -Original Message- From: Dirk Högemann
> Sent: Tuesday, August 21, 2012 9:17 AM
> To: solr-user@lucene.apache.org
> Subject: Auto commit exception in Solr 4.0 Beta
>
> Hello,
>
> I am trying to make our search application Solr 4.0 (Beta) ready and
> elaborate on the tasks necessary to accomplish this.
> When I try to reindex our documents I get the following exception:
>
> auto commit error...:java.lang.**UnsupportedOperationException: this codec
> can only be used for reading
>at
> org.apache.lucene.codecs.**lucene3x.Lucene3xCodec$1.**
> writeLiveDocs(Lucene3xCodec.**java:74)
>at
> org.apache.lucene.index.**ReadersAndLiveDocs.**writeLiveDocs(**
> ReadersAndLiveDocs.java:278)
>at
> org.apache.lucene.index.**IndexWriter$ReaderPool.**
> release(IndexWriter.java:435)
>at
> org.apache.lucene.index.**BufferedDeletesStream.**applyDeletes(**
> BufferedDeletesStream.java:**278)
>at
> org.apache.lucene.index.**IndexWriter.applyAllDeletes(**
> IndexWriter.java:2928)
>at
> org.apache.lucene.index.**IndexWriter.maybeApplyDeletes(**
> IndexWriter.java:2919)
>at
> org.apache.lucene.index.**IndexWriter.prepareCommit(**
> IndexWriter.java:2666)
>at
> org.apache.lucene.index.**IndexWriter.commitInternal(**
> IndexWriter.java:2793)
>at org.apache.lucene.index.**IndexWriter.commit(**
> IndexWriter.java:2773)
>at
> org.apache.solr.update.**DirectUpdateHandler2.commit(**
> DirectUpdateHandler2.java:531)
>at org.apache.solr.update.**CommitTracker.run(**
> CommitTracker.java:214)
>at
> java.util.concurrent.**Executors$RunnableAdapter.**
> call(Executors.java:441)
>at
> java.util.concurrent.**FutureTask$Sync.innerRun(**FutureTask.java:303)
>at java.util.concurrent.**FutureTask.run(FutureTask.**java:138)
>at
> java.util.concurrent.**ScheduledThreadPoolExecutor$**
> ScheduledFutureTask.access$**301(**ScheduledThreadPoolExecutor.**java:98)
>at
> java.util.concurrent.**ScheduledThreadPoolExecutor$**
> ScheduledFutureTask.run(**ScheduledThreadPoolExecutor.**java:206)
>at
> java.util.concurrent.**ThreadPoolExecutor$Worker.**
> runTask(ThreadPoolExecutor.**java:886)
>at
> java.util.concurrent.**ThreadPoolExecutor$Worker.run(**
> ThreadPoolExecutor.java:908)
>at java.lang.Thread.run(Thread.**java:662)
>
> Is this a known bug, or is it maybe a Classpath problem I am facing here?
>
> Best
> Dirk Hoegemann
>


solr4.0 LimitTokenCountFilterFactory NumberFormatException

2012-10-17 Thread Dirk Högemann
Hi,

I am trying to upgrade from Solr 3.5 to Solr 4.0.
I read the following in the example solrconfig:

 

I tried that as follows:

...

  







  
...

The LimitTokenCountFilterFactory configured like that crashes the startup
of the corresponding core with the following exception (without the Factory
the core startup works):


17.10.2012 17:44:19 org.apache.solr.common.SolrException log
SCHWERWIEGEND: null:org.apache.solr.common.SolrException: Plugin init
failure for [schema.xml] fieldType "textgen": Plugin init failure for
[schema.xml] analyze
r/filter: null
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
at
org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:369)
at org.apache.solr.schema.IndexSchema.(IndexSchema.java:113)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:846)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:534)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
at
org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:277)
at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258)
at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382)
at
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:103)
at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4638)
at
org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5294)
at
org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:895)
at
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:871)
at
org.apache.catalina.core.StandardHost.addChild(StandardHost.java:615)
at
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:649)
at
org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1581)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] analyzer/filter: null
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
at
org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:377)
at
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95)
at
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
... 25 more
Caused by: java.lang.NumberFormatException: null
at java.lang.Integer.parseInt(Integer.java:417)
at java.lang.Integer.parseInt(Integer.java:499)
at
org.apache.lucene.analysis.miscellaneous.LimitTokenCountFilterFactory.init(LimitTokenCountFilterFactory.java:48)
at
org.apache.solr.schema.FieldTypePluginLoader$3.init(FieldTypePluginLoader.java:367)
at
org.apache.solr.schema.FieldTypePluginLoader$3.init(FieldTypePluginLoader.java:358)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:159)
... 29 more

Any ideas?

Best
Dirk


Re: solr4.0 LimitTokenCountFilterFactory NumberFormatException

2012-10-17 Thread Dirk Högemann
:-) great solution...will look funny in our production system.
Am 17.10.2012 16:12 schrieb "Jack Krupansky" :

> Anybody want to guess what's wrong with this code:
>
> String maxTokenCountArg = args.get("maxTokenCount");
> if (maxTokenCountArg == null) {
>  throw new IllegalArgumentException("**maxTokenCount is mandatory.");
> }
> maxTokenCount = Integer.parseInt(args.get(**maxTokenCountArg));
>
> Hmmm... try this "workaround":
>
>  maxTokenCount="foo" foo="1"/>
>
> -- Jack Krupansky
>
> -Original Message- From: Dirk Högemann
> Sent: Wednesday, October 17, 2012 11:50 AM
> To: solr-user@lucene.apache.org
> Subject: solr4.0 LimitTokenCountFilterFactory NumberFormatException
>
> Hi,
>
> I am trying to upgrade from Solr 3.5 to Solr 4.0.
> I read the following in the example solrconfig:
>
> 
>
> I tried that as follows:
>
> ...
>  positionIncrementGap="100">
>  
>
> maxTokenCount="10"/>
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
>
> language="German"
> />
> words="stopwords.txt" enablePositionIncrements="**true" />
>
>  
> ...
>
> The LimitTokenCountFilterFactory configured like that crashes the startup
> of the corresponding core with the following exception (without the Factory
> the core startup works):
>
>
> 17.10.2012 17:44:19 org.apache.solr.common.**SolrException log
> SCHWERWIEGEND: null:org.apache.solr.common.**SolrException: Plugin init
> failure for [schema.xml] fieldType "textgen": Plugin init failure for
> [schema.xml] analyze
> r/filter: null
>at
> org.apache.solr.util.plugin.**AbstractPluginLoader.load(**
> AbstractPluginLoader.java:177)
>at
> org.apache.solr.schema.**IndexSchema.readSchema(**IndexSchema.java:369)
>at org.apache.solr.schema.**IndexSchema.(**
> IndexSchema.java:113)
>at org.apache.solr.core.**CoreContainer.create(**
> CoreContainer.java:846)
>at org.apache.solr.core.**CoreContainer.load(**
> CoreContainer.java:534)
>at org.apache.solr.core.**CoreContainer.load(**
> CoreContainer.java:356)
>at
> org.apache.solr.core.**CoreContainer$Initializer.**
> initialize(CoreContainer.java:**308)
>at
> org.apache.solr.servlet.**SolrDispatchFilter.init(**
> SolrDispatchFilter.java:107)
>at
> org.apache.catalina.core.**ApplicationFilterConfig.**initFilter(**
> ApplicationFilterConfig.java:**277)
>at
> org.apache.catalina.core.**ApplicationFilterConfig.**getFilter(**
> ApplicationFilterConfig.java:**258)
>at
> org.apache.catalina.core.**ApplicationFilterConfig.**setFilterDef(**
> ApplicationFilterConfig.java:**382)
>at
> org.apache.catalina.core.**ApplicationFilterConfig.**
> (ApplicationFilterConfig.java:**103)
>at
> org.apache.catalina.core.**StandardContext.filterStart(**
> StandardContext.java:4638)
>at
> org.apache.catalina.core.**StandardContext.startInternal(**
> StandardContext.java:5294)
>at
> org.apache.catalina.util.**LifecycleBase.start(**LifecycleBase.java:150)
>at
> org.apache.catalina.core.**ContainerBase.**addChildInternal(**
> ContainerBase.java:895)
>at
> org.apache.catalina.core.**ContainerBase.addChild(**
> ContainerBase.java:871)
>at
> org.apache.catalina.core.**StandardHost.addChild(**StandardHost.java:615)
>at
> org.apache.catalina.startup.**HostConfig.deployDescriptor(**
> HostConfig.java:649)
>at
> org.apache.catalina.startup.**HostConfig$DeployDescriptor.**
> run(HostConfig.java:1581)
>at
> java.util.concurrent.**Executors$RunnableAdapter.**
> call(Executors.java:441)
>at
> java.util.concurrent.**FutureTask$Sync.innerRun(**FutureTask.java:303)
>at java.util.concurrent.**FutureTask.run(FutureTask.**java:138)
>at
> java.util.concurrent.**ThreadPoolExecutor$Worker.**
> runTask(ThreadPoolExecutor.**java:886)
>at
> java.util.concurrent.**ThreadPoolExecutor$Worker.run(**
> ThreadPoolExecutor.java:908)
>at java.lang.Thread.run(Thread.**java:662)
> Caused by: org.apache.solr.common.**SolrException: Plugin init failure for
> [schema.xml] analyzer/filter: null
>at
> org.apache.solr.util.plugin.**AbstractPluginLoader.load(**
> AbstractPluginLoader.java:177)
>at
> org.apache.solr.schema.**FieldTypePlu

Forwardslash delimiter.Solr4.0 query for path like /Customer/Content/*

2012-10-30 Thread Dirk Högemann
Hi,

I am currently upgrading from Solr 3.5 to Solr 4.0

I used to have filter-bases restrictions for my search based on the paths
of documents in a content repository.
E.g.  fq={!q.op=OR df=}folderPath_}/customer/content/*

Unfortunately this does not work anymore, as lucene now supports
Regexpsearches - delimiting the expression with forward slashes:
http://lucene.apache.org/core/4_0_0-BETA/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Regexp_Searches

this leads to a parsed query, which is of course not what is intended:

RegexpQuery(folderPath_:/standardlsg/)
folderPath_:shareddocs RegexpQuery(folderPath_:/personen/)
folderPath_:*

Is there a possibility to make the example query above work, without
escaping the "/" with "\/"?
Otherwise I will have to parse all queries  (coming from persisted
configurations in the repositiory) and escape the relevant parts of the
queries on that field, which is somewhat ugly...

The field I search on is of type:




  
  


 

Best and thanks for any hints
Dirk


Re: Forwardslash delimiter.Solr4.0 query for path like /Customer/Content/*

2012-11-01 Thread Dirk Högemann
Ok.If there is no other way I will have some string parsing to do, but in
this case I am wondering a little bit about the chosen delimiter...as it is
central to nearly any path in directories, web resources etc.,right?
Best
Dirk
Am 30.10.2012 19:16 schrieb "Jack Krupansky" :

> Maybe a custom search component that runs before the QueryComponent and
> does the escaping?
>
> -- Jack Krupansky
>
> -Original Message- From: Dirk Högemann
> Sent: Tuesday, October 30, 2012 1:07 PM
> To: solr-user@lucene.apache.org
> Subject: Forwardslash delimiter.Solr4.0 query for path like
> /Customer/Content/*
>
> Hi,
>
> I am currently upgrading from Solr 3.5 to Solr 4.0
>
> I used to have filter-bases restrictions for my search based on the paths
> of documents in a content repository.
> E.g.  fq={!q.op=OR df=}folderPath_}/customer/**content/*
>
> Unfortunately this does not work anymore, as lucene now supports
> Regexpsearches - delimiting the expression with forward slashes:
> http://lucene.apache.org/core/**4_0_0-BETA/queryparser/org/**
> apache/lucene/queryparser/**classic/package-summary.html#**Regexp_Searches<http://lucene.apache.org/core/4_0_0-BETA/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Regexp_Searches>
>
> this leads to a parsed query, which is of course not what is intended:
>
>  name="parsed_filter_queries"><**str>RegexpQuery(folderPath_:/**
> standardlsg/)
> folderPath_:shareddocs RegexpQuery(folderPath_:/**personen/)
> folderPath_:*
>
> Is there a possibility to make the example query above work, without
> escaping the "/" with "\/"?
> Otherwise I will have to parse all queries  (coming from persisted
> configurations in the repositiory) and escape the relevant parts of the
> queries on that field, which is somewhat ugly...
>
> The field I search on is of type:
>
> 
>
>
>  
>  
>
>
> 
>
> Best and thanks for any hints
> Dirk
>


Empty shard1 - -:{"shard1":[]} cannot add new replicas

2021-02-05 Thread Dirk Wintergruen

Dear all,
I cannot add or remove any replicas of one collection. Diagnostics in the log 
file shows empty shards

  "gmpg-fulltext3":{"shard1":[]},

see below. What can I do?


ng.error.diagnostics.3897285248187441  {
  "sortedNodes":[{
  "node":"gmpg-services8.mpiwg-berlin.mpg.de:48983_solr",
  "isLive":true,
  "cores":3.0,
  "freedisk":466.3022346496582,
  "totaldisk":3814.24609375,
  "replicas":{
"gmpg-fulltext3":{"shard1":[]},
"gmpg-db":{"shard1":[{
  "core_node20":{
"core":"gmpg-db_shard1_replica_n19",
"shard":"shard1",
"collection":"gmpg-db",
"node_name":"gmpg-services8.mpiwg-berlin.mpg.de:48983_solr",
"type":"NRT",

"base_url":"http://gmpg-services8.mpiwg-berlin.mpg.de:48983/solr";,
"state":"down",
"force_set_state":"false",
"INDEX.sizeInGB":0.06685098074376583}}]},
"abstracts4":{"shard1":[{
  "core_node4":{
"core":"abstracts4_shard1_replica_n3",
"shard":"shard1",
"collection":"abstracts4",
"node_name":"gmpg-services8.mpiwg-berlin.mpg.de:48983_solr",
"type":"NRT",
"leader":"true",

"base_url":"http://gmpg-services8.mpiwg-berlin.mpg.de:48983/solr";,
"state":"active",
"force_set_state":"false",
"INDEX.sizeInGB":22.7537336172536}}]},
"gmpg-fulltext-dev":{"shard1":[{
  "core_node2":{
"core":"gmpg-fulltext-dev_shard1_replica_n1",
"shard":"shard1",
"collection":"gmpg-fulltext-dev",
"node_name":"gmpg-services8.mpiwg-berlin.mpg.de:48983_solr",
"type":"NRT",
"leader":"true",

"base_url":"http://gmpg-services8.mpiwg-berlin.mpg.de:48983/solr";,
"state":"active",
"force_set_state":"false",
"INDEX.sizeInGB":1.3224780559539795E-7}}]}}}



Error message is:
021-02-05 10:57:41.181 ERROR 
(OverseerThreadFactory-23-thread-3-processing-n:gmpg-services8.mpiwg-berlin.mpg.de:58983_solr)
 [c:gmpg-fulltext3 s:shard1  ] o.a.s.c.a.c.OverseerCollectionMessageHandler C\
ollection: gmpg-fulltext3 operation: addreplica 
failed:org.apache.solr.cloud.api.collections.Assign$AssignmentException: Error 
getting replica locations :  No node can satisfy the rules "[] More detail\
s from logs in node : gmpg-services8.mpiwg-berlin.mpg.de:58983_solr, errorId : 
AutoScaling.error.diagnostics.3897285248187441"
at 
org.apache.solr.cloud.api.collections.Assign.getPositionsUsingPolicy(Assign.java:394)
at 
org.apache.solr.cloud.api.collections.Assign$PolicyBasedAssignStrategy.assign(Assign.java:630)
at 
org.apache.solr.cloud.api.collections.Assign.getNodesForNewReplicas(Assign.java:368)
at 
org.apache.solr.cloud.api.collections.AddReplicaCmd.buildReplicaPositions(AddReplicaCmd.java:370)
at 
org.apache.solr.cloud.api.collections.AddReplicaCmd.addReplica(AddReplicaCmd.java:156)
at 
org.apache.solr.cloud.api.collections.AddReplicaCmd.call(AddReplicaCmd.java:93)
at 
org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:263)
at 
org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:505)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.solr.common.SolrException:  No node can satisfy the rules 
"[] More details from logs in node : 
gmpg-services8.mpiwg-berlin.mpg.de:58983_solr, errorId : 
AutoScaling.error.diagnosti\
cs.3897285248187441"
at 
org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper.getReplicaLocations(PolicyHelper.java:185)
at 
org.apache.solr.cloud.api.collections.Assign.getPositionsUsingPolicy(Assign.java:382)
... 11 more




Cheers
Dirk

--
--
Dr.-Ing. Dirk Wintergrün

Max-Planck-Institut für Wissenschaftsgeschichte
Max Planck Institute for the History of Science

Department I / Digital and Computational Humanities

Boltzmannstr. 22
14195 Berlin

+49 20 22 66 7108
dwin...@mpiwg-berlin.mpg.de





signature.asc
Description: Message signed with OpenPGP