Re: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-18 Thread Michael Sokolov
Yes, Congratulations and a big thank you Jan! On Thu, Feb 18, 2021 at 1:56 PM Anshum Gupta wrote: > > Hi everyone, > > I’d like to inform everyone that the newly formed Apache Solr PMC nominated > and elected Jan Høydahl for the position of the Solr PMC Chair and Vice > President. This decision

Re: highlighting the boolean query

2015-02-24 Thread Michael Sokolov
There is also PostingsHighlighter -- I recommend it, if only for the performance improvement, which is substantial, but I'm not completely sure how it handles this issue. The one drawback I *am* aware of is that it is insensitive to positions (so words from phrases get highlighted even in isol

Re: Solr suggest is related to second letter, not to initial letter

2015-02-18 Thread Michael Sokolov
On 02/17/2015 03:46 AM, Volkan Altan wrote: First of all thank you for your answer. You're welcome - thanks for sending a more complete example of your problem and expected behavior. I don’t want to use KeywordTokenizer. Because, as long as the compound words written by the user are availabl

Re: Solr suggest is related to second letter, not to initial letter

2015-02-15 Thread Michael Sokolov
StandardTokenizer splits your text into tokens, and the suggester suggests tokens independently. It sounds as if you want the suggestions to be based on the entire text (not just the current word), and that only adjacent words in the original should appear as suggestions. Assuming that's what

Re: DIH: entities in xml problem

2015-02-04 Thread Michael Sokolov
ent insert (and the content has the entities) and will be dificult add the DTD to the content... Thanks - Raul El 03/02/15 a las 17:15, Michael Sokolov escribió: If the entities are in the content, you would need to add the DTD to the content, not to the stylesheet. Or you could transfo

Re: DIH: entities in xml problem

2015-02-03 Thread Michael Sokolov
If the entities are in the content, you would need to add the DTD to the content, not to the stylesheet. Or you could transform the content converting the entities. -Mike On 02/03/2015 10:41 AM, Raul wrote: Hi all! I'm trying to use Solr with the DIH and xslt processing. All is fine till i

Re: How to exclude selected filter (facet) from search result?

2015-02-02 Thread Michael Sokolov
Have a look here: https://wiki.apache.org/solr/SimpleFacetParameters#Multi-Select_Faceting_and_LocalParams; it might answer your question. Typically what I recommend is to keep the selected facet in view, but without any limitation on its counts. However if you want to hide it altogether, I t

Re: Solr Suggester Autocomplete Working Example

2015-02-02 Thread Michael Sokolov
Please go ahead and play with autocomplete on safaribooksonline.com/home - if you are not a subscriber you will have to sign up for a free trial. We use the AnalyzingInfixSuggester. From your description, it sounds as if you are building completions from a field that you also use for searchin

Re: Solr Logging files get high

2015-02-02 Thread Michael Sokolov
I was tempted to suggest rehab -- but seriously it wasn't clear if Nitin meant the log files Michael is referring to, or the transaction log (tlog). If it's the transaction log, the solution is more frequent hard commits. -Mike On 2/2/2015 11:48 AM, Michael Della Bitta wrote: If you'd like

Re: [MASSMAIL]Re: "Contextual" sponsored results with Solr

2015-01-31 Thread Michael Sokolov
If you have a finite known set of hosts, you could do something truly awful: create a field for each distinct host and set all of them to have value={id of the document} except for the host to which the document belongs: assign that hostname field some constant value, like "true". Then query

Re: Does DocValues improve Grouping performance ?

2015-01-31 Thread Michael Sokolov
On 1/31/2015 2:47 PM, Mikhail Khludnev wrote: Michael, Please check two questions inlined below Hi Mikhail, On Sat, Jan 31, 2015 at 10:14 PM, Michael Sokolov < msoko...@safaribooksonline.com> wrote: You can only handle a single relation this way since you have to restructure your in

Re: Does DocValues improve Grouping performance ?

2015-01-31 Thread Michael Sokolov
We were using grouping (no DocValues, though) and recently switched to using block-indexing and joins (see https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers). We got a nice speedup on average (perhaps 2x faster) and an even better improvement in t

Re: AW: transactions@Solr(J)

2015-01-20 Thread Michael Sokolov
commits to the Solr-transaction.log? -Clemens -Ursprüngliche Nachricht- Von: Michael Sokolov [mailto:msoko...@safaribooksonline.com] Gesendet: Dienstag, 20. Januar 2015 14:54 An: solr-user@lucene.apache.org Betreff: Re: transactions@Solr(J) On 1/20/2015 5:18 AM, Clemens Wyss DEV wr

Re: transactions@Solr(J)

2015-01-20 Thread Michael Sokolov
On 1/20/2015 5:18 AM, Clemens Wyss DEV wrote: http://stackoverflow.com/questions/10805117/solr-transaction-management-using-solrj Is it true, that a SolrServer-instance denotes a "transaction context"? Say I have two concurrent threads, each having a SolrServer-instance "pointing" to the same c

Re: Need Debug Direction on Performance Problem

2015-01-18 Thread Michael Sokolov
You can also implement your own cursor easily enough if you have a unique sortkey (not relevance score). Say you can sort by id, then you select batch 1 (50k docs, say) and record the last (maximum) id in the batch. For the next batch, limit it to id > last_id and get the first 50k docs (don't

Re: Solr example for Solr 4.10.2 gives warning about Multiple request handlers with same name

2015-01-16 Thread Michael Sokolov
I've seen the same thing, poked around a bit and eventually decided to ignore it. I think there may be a ticket related to that saying it's a logging bug (ie not a real issue), but I couldn't swear to it. -Mike On 01/16/2015 12:36 PM, Tom Burton-West wrote: Hello, I'm running Solr 4.10.2 ou

Re: Occasionally getting error in solr suggester component.

2015-01-15 Thread Michael Sokolov
can avoid the rebuilt index on every commit or optimize. Is this the right way ?? or any that I missed ??? Regards dhanesh s.r On Thu, Jan 15, 2015 at 3:20 AM, Michael Sokolov < msoko...@safaribooksonline.com> wrote: did you build the spellcheck index using spellcheck.build as descr

Re: Occasionally getting error in solr suggester component.

2015-01-14 Thread Michael Sokolov
d, Jan 14, 2015 at 12:47 AM, Michael Sokolov < msoko...@safaribooksonline.com> wrote: I think you are probably getting bitten by one of the issues addressed in LUCENE-5889 I would recommend against using buildOnCommit=true - with a large index this can be a performance-killer. Instead, bui

Re: How to configure Solr PostingsFormat block size

2015-01-14 Thread Michael Sokolov
As a foolish dev (not malicious I hope!), I did mess around with something like this once; I was writing my own Codec. I found I had to create a file called META-INF/services/org.apache.lucene.codecs.Codec in my solr plugin jar that contained the fully-qualified class name of my codec: I guess

Re: Occasionally getting error in solr suggester component.

2015-01-13 Thread Michael Sokolov
I think you are probably getting bitten by one of the issues addressed in LUCENE-5889 I would recommend against using buildOnCommit=true - with a large index this can be a performance-killer. Instead, build the index yourself using the Solr spellchecker support (spellcheck.build=true) -Mike

Re: How to configure Solr PostingsFormat block size

2015-01-12 Thread Michael Sokolov
It looks like this is a good starting point: http://wiki.apache.org/solr/SolrConfigXml#codecFactory -Mike On 01/12/2015 03:37 PM, Tom Burton-West wrote: Hello all, Our indexes have around 3 billion unique terms, so for Solr 3, we set TermIndexInterval to about 8 times the default. The net ef

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Michael Sokolov
On 12/30/14 12:42 PM, Jonathan Rochkind wrote: On 12/30/14 12:35 PM, Walter Underwood wrote: You want preserveOriginal=“1”. You should only do this processing at index time. If I only do this processing at index time, then "mixedCase" at query time will no longer match "mixed Case" in the in

Re: Multi Language Suggester Solr Issue

2014-12-28 Thread Michael Sokolov
I noticed that your suggester analyzers include which seems like a bad idea -- this will strip all those arabic, russian and japanese characters entirely, leaving you with probably only whitespace in your tokens. Try just removing that? -Mike On 12/24/14 6:09 PM, alaa.abuzaghleh wrote: I

Re: converting to parent/child block indexing

2014-12-17 Thread Michael Sokolov
t 12:33 AM, Michael Sokolov < msoko...@safaribooksonline.com> wrote: Have other people tried migrating an index that was created without block (parent/child) indexing to one that *does* have it? Did you find that you got duplicate documents - ie multiple documents with the same uniqueField value

converting to parent/child block indexing

2014-12-17 Thread Michael Sokolov
Have other people tried migrating an index that was created without block (parent/child) indexing to one that *does* have it? Did you find that you got duplicate documents - ie multiple documents with the same uniqueField value? That's what I found, and I don't see how that's possible. What

Re: questions about BlockJoinParentQParser

2014-12-17 Thread Michael Sokolov
Thanks Andrey! I voted for your patch -Mike On 12/17/2014 4:01 AM, Kydryavtsev Andrey wrote: For support scoreMode parameter in BlockJoinParentQParser we have this jira with attached patch https://issues.apache.org/jira/browse/SOLR-5882 17.12.2014, 06:54, "Michael Sokolov" : I&#

questions about BlockJoinParentQParser

2014-12-16 Thread Michael Sokolov
I'm trying to use BJPQP and ran into a few little gotchas that I'd like to share with y'all in case you have any advice. First I ran into an NPE that probably should be handled better - maybe just an exception with a better message. The framework I'm working in makes it slightly annoying to u

Re: My new lemmatizer interfers with the highlighter

2014-12-15 Thread Michael Sokolov
Well I think your first step should be finding a reproducible test case and encoding it as a unit test. But I suspect ultimately the fix will be something to do with positionIncrement ... -Mike On 12/15/2014 09:08 AM, Erlend Garåsen wrote: On 15.12.14 14:11, Michael Sokolov wrote: I'

Re: My new lemmatizer interfers with the highlighter

2014-12-15 Thread Michael Sokolov
I'm not sure, but is it necessary to set positionIncAttr to 1 when there are *not* any lemmas found? I think the usual pattern is to call clearAttributes() at the start of incrementToken -Mike On 12/15/14 7:38 AM, Erlend Garåsen wrote: I have written a dictionary-based lemmatizer for Universi

Re: different fields for user-supplied phrases in edismax

2014-12-13 Thread Michael Sokolov
e case? What do mean by "controlling the fields used for phrase queries" ? Rgds AJ On 12-Dec-2014, at 20:11, Michael Sokolov wrote: Doug - I believe pf controls the fields that are used for the phrase queries *generated by the parser*. What I am after is controlling the fields use

Re: different fields for user-supplied phrases in edismax

2014-12-13 Thread Michael Sokolov
I want terms to be stemmed, unless they are quoted, using dismax. On 12/12/14 8:19 PM, Amit Jha wrote: Hi Mike, What is exact your use case? What do mean by "controlling the fields used for phrase queries" ? Rgds AJ On 12-Dec-2014, at 20:11, Michael Sokolov wrote: Doug - I

Re: different fields for user-supplied phrases in edismax

2014-12-12 Thread Michael Sokolov
, I typically solve this problem by using a copyField and running different analysis on the destination field. Then you could use this field as pf insteaf of qf. If I recall, fields in pf must also be mentioned in qf for this to work. -Doug On Fri, Dec 12, 2014 at 8:13 AM, Michael Sokolov < ms

Re: different fields for user-supplied phrases in edismax

2014-12-12 Thread Michael Sokolov
t. Ahmet On Thursday, December 11, 2014 10:50 PM, Michael Sokolov wrote: I'd like to supply a different set of fields for phrases than for bare terms. Specifically, we'd like to treat phrases as more "exact" - probably turning off stemming and generally having a tighter analysis

Re: Highlighting integer field

2014-12-11 Thread Michael Sokolov
So the short answer to your original question is "no." Highlighting is designed to find matches *within* a tokenized (text) field only. That is difficult because text gets processed and there are all sorts of complications, but for integers it should be pretty easy to match the values in the d

Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page

2014-12-11 Thread Michael Sokolov
Have you rebooted the machine? (last refuge of the clueless, but often works) ... On 12/11/14 2:50 PM, solr-user wrote: yes, have triple checked the schema and solrconfig XML; various tools have indicated the XML is valid no missing types or dupes, and have not disabled the admin handler as m

different fields for user-supplied phrases in edismax

2014-12-11 Thread Michael Sokolov
I'd like to supply a different set of fields for phrases than for bare terms. Specifically, we'd like to treat phrases as more "exact" - probably turning off stemming and generally having a tighter analysis chain. Note: this is *not* what's done by configuring "pf" which controls fields for t

Re: Q: Does anybody asks/answer Solr questions on Stack Overflow? Why?

2014-12-09 Thread Michael Sokolov
Alex, I spent some time answering questions there, but got ultimately got turned off by the competitive nature of it. I wanted to increase my score -- fun! But if you are not watching it all the time, the questions go by very fast, and you lose your edge. The typical pattern seems to be: so-so

Re: AW: AW: Keeping capitalization in suggestions?

2014-12-09 Thread Michael Sokolov
text_suggest 4 ... my schema.xml ... ... ... -Ursprüngliche Nachricht- Von: Michael Sokolov [mailto:msoko...@safaribooksonline.com] Gesendet: Donnerstag, 4. Dezember 2014 14:05 An: solr-user@lucene.apache.org Betreff: Re: Keeping capitalization in suggestions? Have a look

Re: Anti-Pattern in lucent-join jar?

2014-12-08 Thread Michael Sokolov
Right - allowing Solr to manage these queries (SOLR-6234) seems like the way to go ... OP == original poster (I lost track of who started the discussion) -Mike On 12/08/2014 10:19 AM, Mikhail Khludnev wrote: On Mon, Dec 8, 2014 at 5:38 PM, Michael Sokolov < msoko...@safaribooksonline.

Re: Anti-Pattern in lucent-join jar?

2014-12-08 Thread Michael Sokolov
I get the impression there was a concern that the caller could hold on to the query generated by JoinUtil for too long - eg across requests in Solr. I'm not sure why the OP thinks that would happen, though. -Mike On 12/08/2014 04:57 AM, Mikhail Khludnev wrote: On Fri, Dec 5, 2014 at 10:44 PM,

Re: Get the new terms of fields since last update

2014-12-05 Thread Michael Sokolov
How about creating a new core that only holds a single week's documents, and retrieving all of its terms? Then each week, flush it and start over. -Mike On 12/05/2014 07:54 AM, lboutros wrote: Dear all, I would like to get the new terms of fields since last update (once a week). If I retriev

Re: Large fields storage

2014-12-04 Thread Michael Sokolov
There's no appreciable RAM cost during querying, faceting, sorting of search results and so on. Stored fields are separate from the inverted index. There is some cost in additional disk space required and I/O during merging, but I think you'll find these are not significant. The main cost we

Re: Keeping capitalization in suggestions?

2014-12-04 Thread Michael Sokolov
Have a look at AnalyzingInfixSuggester - it does what you want. -Mike On 12/4/14 3:05 AM, Clemens Wyss DEV wrote: When I index a text such as "Chamäleon" and look for suggestions for "chamä" and/or "Chamä", I'd expect to get "Chamäleon" (uppercased). But what happens is If lowecasefilter (see

Re: Problem with additional Servlet Filter (SolrRequestParsers Exception)

2014-12-03 Thread Michael Sokolov
Stefan I had problems like this -- and the short answer is -- it's a PITA. Solr is not really designed to be extended in this way. In fact I believe they are moving towards an architecture where this is even less possible - folks will be encouraged to run solr using a bundled exe, perhaps wit

Re: indexing numbers in texts for range queries

2014-12-02 Thread Michael Sokolov
On 12/02/2014 03:41 PM, Mikhail Khludnev wrote: Thanks for suggestions. Do I remember correctly that you ignored last Lucene Revolution? I wouldn't say I ignored it, but it's true I wasn't there in DC: I'm excited to catch up on the presentations as the videos become available, though. -Mike

Re: indexing numbers in texts for range queries

2014-12-02 Thread Michael Sokolov
Mikhail - I can imagine a filter that strips out everything but numbers and then indexes those with a (separate) numeric (trie) field. But I don't believe you can do phrase or other proximity queries across multiple fields. As long as an or-query is good enough, I think this problem is not to

Re: SOLR Join Query, Use highest weight.

2014-12-02 Thread Michael Sokolov
, Michael Sokolov wrote: Have you considered using grouping? If I understand your requirements, I think it does what you want. https://cwiki.apache.org/confluence/display/solr/Result+Grouping <https://cwiki.apache.org/confluence/display/solr/Result+Grouping> On 12/02/2014 12:59 PM, Dari

Re: Getting the position of a word via Solr API

2014-12-02 Thread Michael Sokolov
I would keep trying with the highlighters. Some of them, at least, have options to provide an external text source, although you will almost certainly have to write some java code to get this working; extend the highlighter you choose and supply its text from an external source. -Mike On 12

Re: SOLR Join Query, Use highest weight.

2014-12-02 Thread Michael Sokolov
Have you considered using grouping? If I understand your requirements, I think it does what you want. https://cwiki.apache.org/confluence/display/solr/Result+Grouping On 12/02/2014 12:59 PM, Darin Amos wrote: Thanks! I will take a look at this. I do have an additional question, since after a

Re: Standardized index metrics (Was: Constantly high disk read access (40-60M/s))

2014-11-29 Thread Michael Sokolov
On 11/29/14 1:30 PM, Toke Eskildsen wrote: Michael Sokolov [msoko...@safaribooksonline.com] wrote: I wonder if there's any value in providing this metric (total index size - stored field size - term vector size) as part of the admin panel? Is it meaningful? It seems like there would be

Re: Constantly high disk read access (40-60M/s)

2014-11-29 Thread Michael Sokolov
pularizers community: https://www.linkedin.com/groups?gid=6713853 On 29 November 2014 at 13:16, Michael Sokolov wrote: Of course testing is best, but you can also get an idea of the size of the non-storage part of your index by looking in the solr index folder and subtracting the size of the files cont

Re: Constantly high disk read access (40-60M/s)

2014-11-29 Thread Michael Sokolov
Of course testing is best, but you can also get an idea of the size of the non-storage part of your index by looking in the solr index folder and subtracting the size of the files containing the stored fields from the total size of the index. This depends of course on the internal storage stra

Re: updateNumericDocValue in solr 4.6.1

2014-11-26 Thread Michael Sokolov
Yes - here's a working example we have in production (tested in 4.8.1 and 4.10.2, but the underlying lucene stuff hasn't changed since 4.6.1 I'm pretty sure): https://github.com/safarijv/ifpress-solr-plugin/blob/master/src/main/java/com/ifactory/press/db/solr/processor/UpdateDocValuesProcessor.

Re: Fwd: Change in the Score of Similiar Documents

2014-11-25 Thread Michael Sokolov
Scores are related to total term frequencies *in each shard*, not globally, and I think they may include term counts from deleted documents as well, which could account for the discrepancy in scores across the two shards. -Mike On 11/25/14 3:22 AM, rashi gandhi wrote: Hi, I have created t

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Michael Sokolov
right -- missed Ahmet's answer there in my haste to respond ... -Mike On 11/25/14 6:56 AM, Ahmet Arslan wrote: Hi Apurv, I wouldn't worry about index size, increase in index size is not linear (2x) like that. Please see similar discussion : https://issues.apache.org/jira/browse/LUCENE-5620 A

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Michael Sokolov
The index size will not increase as quickly as you might think, and is not an issue in most cases. An alternative to two fields, though, is to index both upper- and lower-case tokens at the same position in a single field, and then to perform no case folding at query time. There is no standar

Re: matching shingles issue

2014-11-24 Thread Michael Sokolov
maybe try description_shingle:(Highest quality) On 11/24/14 1:46 PM, vit wrote: I have Solr 4.2.1 I am using the following analyser:

Re: Error while initializing EmbeddedSolrServer

2014-11-23 Thread Michael Sokolov
Those Spi classes rely on a configuration file that gets stored in the META-INF folder. I'm not familiar with who OSGI works, but I'm pretty sure that failure is because the file META-INF/services/org.apache.lucene.codecs.Codec (you'll see it in the lucene-core jar) can't be found -Mike On

Re: Handling intersection facets of many values

2014-11-20 Thread Michael Sokolov
If you're willing to write some Java you can do something more efficient by intersecting two terms enumerations: this works with constant memory for any number of values in two fields, basically like intersecting any two sorted lists, you leap frog between them. I have an example if you're int

Re: problems when hunspell returns multiple stems

2014-11-18 Thread Michael Sokolov
OK - please disregard; I found a rogue new component in our analyzer that was messing everything up. The hunspell behavior was perhaps a little confusing, but I don't believe it leads to broken queries. -Mike On 11/18/2014 02:38 PM, Michael Sokolov wrote: followup - hunspell has: f

problems when hunspell returns multiple stems

2014-11-18 Thread Michael Sokolov
I find that a query for stemmed terms sometimes fails with the edismax query parser and hunspell stemmer. Looklng at the output of analysis for the query (text:following) I can see that it generates two different terms at the same position: "follow" and "following". Then edismax seems to genera

Re: problems when hunspell returns multiple stems

2014-11-18 Thread Michael Sokolov
nerating multiple "stems" causes issues On 11/18/2014 02:33 PM, Michael Sokolov wrote: I find that a query for stemmed terms sometimes fails with the edismax query parser and hunspell stemmer. Looklng at the output of analysis for the query (text:following) I can see that it generates two

Re: Suggest dictionaries not rebuilding after restart

2014-11-14 Thread Michael Sokolov
Mike On 11/14/14 2:01 AM, Walter Underwood wrote: We get no suggestions until we force a build with suggest.build=true. Maybe we need to define a spellchecker component to get that behavior? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Nov 13, 2014, at

Re: DIH Blob data

2014-11-14 Thread Michael Sokolov
On 11/14/2014 01:43 PM, Erick Erickson wrote: Just skimming, so maybe I misinterpreted. ExternalFileField and ExternalFileFieldReloader refer to storing values for each doc in an external file, they have nothing to do with storing _files_. The usual pattern is to have Solr store just enough da

Re: DIH Blob data

2014-11-14 Thread Michael Sokolov
an use filter query like "fq=terms:a:1" 2014. 11. 13. 오전 3:59에 "Michael Sokolov" 님이 작성: We routinely store images and pdfs in Solr. There *is* a benefit, since you don't need to manage another storage system, you don't have to worry about Solr getting out of sync with

Re: Suggest dictionaries not rebuilding after restart

2014-11-14 Thread Michael Sokolov
4/14 2:01 AM, Walter Underwood wrote: We get no suggestions until we force a build with suggest.build=true. Maybe we need to define a spellchecker component to get that behavior? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Nov 13, 2014, at 10:56 PM, Michael

Re: Suggest dictionaries not rebuilding after restart

2014-11-13 Thread Michael Sokolov
I believe the spellchecker component persists these indexes now and reloads them on restart rather than rebuilding. -Mike On 11/13/14 7:40 PM, Walter Underwood wrote: We have to manually rebuild the suggest dictionaries after a restart. This seems odd, since someone else had a problem because

Re: DIH Blob data

2014-11-12 Thread Michael Sokolov
We routinely store images and pdfs in Solr. There *is* a benefit, since you don't need to manage another storage system, you don't have to worry about Solr getting out of sync with the other system, you can use Solr replication for all your assets, etc. I don't use DIH, so personally I don't c

Re: How to suggest from multiple fields?

2014-11-11 Thread Michael Sokolov
The usual approach is to use copyField to copy multiple fields to a single field. I posted a solution using an UpdateRequestProcessor to merge fields, but with different analyzers, here: https://blog.safaribooksonline.com/2014/04/15/search-suggestions-with-solr-2/ My latest approach is this:

Re: Best practice: Autosuggest/autocomplete vs. "real search"

2014-11-10 Thread Michael Sokolov
The goal is to ensure that suggestions from autocomplete are actually terms in the main index, so that the suggestions will actually result in matches. You've considered expanding the main index by adding the suggestion n-grams to it, but it would probably be better to alter your suggester so

Re: Is there a way to stop some hyphenated terms from being tokenized

2014-11-05 Thread Michael Sokolov
You didn't describe your analysis chain, but maybe you are using WordDelimiterFilter to break up hyphenated words? If so, it has a protwords.txt feature that lets you specify exceptions -Mike On 11/5/2014 5:36 PM, Michael Della Bitta wrote: Pretty sure what you need is called KeywordMarkerFil

Re: Missing log entries with log4j log rotation

2014-11-04 Thread Michael Sokolov
Shawn this is really weird -- we run log4j in lots of installations and have never seen an issue like this. I wonder if you might be running some other log rotation software (like logrotate) that is somehow getting in the way or conflicting? -Mike On 11/01/2014 01:45 PM, Shawn Heisey wrote:

Re: dynamically change default update chain

2014-11-03 Thread Michael Sokolov
Just to get the obvious sledgehammer solution out of the way - upload a new, edited solrconfig.xml with the default changed, and reload the core. -Mike On 11/3/14 6:28 AM, Dmitry Kan wrote: Hello solr fellows, I'm working on a project that involves using two update chains. One default chain

Re: function results' names include trailing whitespace

2014-10-29 Thread Michael Sokolov
OK, I opened SOLR-6672; not sure how I stumbled into using white space; I would ordinarily use commas too, I think. -Mike On 10/29/14 1:23 PM, Chris Hostetter wrote: : fl="id field(units_used) archive_id" I didn't even realize until today that fl was documented to support space seperated fiel

function results' names include trailing whitespace

2014-10-29 Thread Michael Sokolov
I noticed that when you include a function as a result field, the corresponding key in the result markup includes trailing whitespace, which seems like a bug. I wonder if anyone knows if there is a ticket for this already? Example: fl="id field(units_used) archive_id" ends up returning resu

Re: AW: AW: AW: (auto)suggestions, but ony from a "filtered" set of documents

2014-10-27 Thread Michael Sokolov
really offer a solution to your problem, but there are some possibly helpful similarities: you will probably want to write a custom UpdateRequestProcessor, and you will want to feed the suggester with a custom Dictionary / InputIterator as I have done in that example. -Mike -Clemens -U

Re: AW: AW: (auto)suggestions, but ony from a "filtered" set of documents

2014-10-26 Thread Michael Sokolov
This project (https://github.com/safarijv/ifpress-solr-plugin/) has some examples of custom Solr UpdateRequestProcessors that feed a single suggester from multiple fields, applying different weights to them, using complete values from some and analyzing others into tokens. The first thing I di

Re: recip function error

2014-10-23 Thread Michael Sokolov
3.16e-11.0 looks fishy to me On 10/23/14 5:09 PM, eShard wrote: Good evening, I'm using solr 4.0 Final. I tried using this function boost=recip(ms(NOW/HOUR,startdatez,3.16e-11.0,0.08,0.05)) but it fails with this error: org.apache.lucene.queryparser.classic.ParseException: Expected ')' at posi

Re: update external file

2014-10-23 Thread Michael Sokolov
That's what I thought; thanks, Markus. On 10/23/14 2:19 PM, Markus Jelsma wrote: You either need to upload them and issue the reload command, or download them from the machine, and then issue the reload command. There is no REST support for it (yet) like the synonym filter, or was it stop filt

Re: update external file

2014-10-23 Thread Michael Sokolov
Thanks for the links, Ramzi. I had already read the wiki page, which merely talks about how to reload the file into memory once it has been updated on disk. It doesn't mention any support for uploading that I can see. Did I miss it? -Mike On 10/23/14 1:36 PM, Ramzi Alqrainy wrote: Of cour

update external file

2014-10-23 Thread Michael Sokolov
I've been looking at ExternalFileField to handle popularity boosting. Since Solr updatable docvalues (SOLR-5944) isn't quite there yet. My question is whether there is any support for uploading the external file via Solr, or if people do that some other (external, I guess) way? -Mike

Re: OOV queries

2014-06-05 Thread Michael Sokolov
It seems as if 0-hit queries should be pretty fast since they can terminate very early? Are you seeing a big difference between first-time and subsequent (cached) no-match queries? -Mike On 6/5/2014 8:47 AM, Dmitry Kan wrote: Hi, Solr is good at caching: even if first "cold" query takes lo

Re: Uneven shard heap usage

2014-06-02 Thread Michael Sokolov
this shard Best, Erick On Mon, Jun 2, 2014 at 4:27 AM, Michael Sokolov < msoko...@safaribooksonline.com> wrote: Joe - there shouldn't really be a problem *indexing* these fields: remember that all the terms are spread across the index, so there is really no storage diffe

Re: Uneven shard heap usage

2014-06-02 Thread Michael Sokolov
Joe - there shouldn't really be a problem *indexing* these fields: remember that all the terms are spread across the index, so there is really no storage difference between one 180MB document and 180 1 MB documents from an indexing perspective. Making the field "stored" is more likely to lead

Re: Uneven shard heap usage

2014-05-31 Thread Michael Sokolov
Is it possible that all your requests are routed to that single shard? I.e. you are not using the smart client that round-robins requests? I think that could cause all of the merging of results to be done on a single node. Also - is it possible you have a "bad" document in that shard? Like o

Re: Solr 4.8: Does eDisMax parser calls analyzer chain to tokenize?

2014-05-17 Thread Michael Sokolov
Alex - the query parsers generally accept an analyzer, which they must apply after they perform their own tokenization. Consider: how would a capitalized query term match lower-cased terms in the index without query analysis? -Mike On 5/17/2014 4:05 AM, Alexandre Rafalovitch wrote: Hello,

Re: AnalyzingInfixLookupFactory with multiple cores

2014-05-16 Thread Michael Sokolov
Thanks Dmitry! On 05/15/2014 07:54 AM, Dmitry Kan wrote: Hi Mike, The core name can be accessed via: ${solr.core.name} in solrconfig.xml (verified in a solr replication config). HTH, Dmitry On Fri, May 9, 2014 at 4:07 PM, Michael Sokolov < msoko...@safaribooksonline.com> wrote: It

AnalyzingInfixLookupFactory with multiple cores

2014-05-15 Thread Michael Sokolov
It seems as if the location of the suggester dictionary directory is not core-specific, so when the suggester is defined for multiple cores, they collide: you get exceptions attempting to obtain the lock, and the suggestions bleed from one core to the other. There is an (undocumented) "indexP

Re: Website running Solr

2014-05-15 Thread Michael Sokolov
On 5/11/2014 12:55 PM, Olivier Austina wrote: Hi All, Is there a way to know if a website use Solr? Thanks. Regards Olivier Ask the people who run the site?

Re: Can't use 2 highlighting components in the same solrconfig

2014-05-06 Thread Michael Sokolov
I don't know what the design was, but your use case seems valid to me: I think you should submit a ticket and a patch. If you write a test, I suppose it might be more likely to get accepted. -Mike On 5/6/2014 10:59 AM, Cario, Elaine wrote: I experimented locally with modifying the SolrCore c

Re: Use XSD or DTD to make Solr schema?

2014-05-06 Thread Michael Sokolov
I'm pretty sure there's nothing to automate that task, but there are some tools to help with indexing XML. Lux (http://luxdb.org) is one; it can index all the element text and attribute values, effectively creating an index for each tag name -- these are not specifically Solr/Lucene fields, bu

Re: PostingHighlighter complains about no offsets

2014-05-03 Thread Michael Sokolov
on lucene 4.8? https://issues.apache.org/jira/plugins/servlet/mobile#issue/LUCENE-5111 Michael Sokolov schreef:For posterity, in case anybody follows this thread, I tracked the problem down to WordDelimiterFilter; apparently it creates an offset of -1 in some case, which PostingsHighlighter rejects. -M

Re: PostingHighlighter complains about no offsets

2014-05-03 Thread Michael Sokolov
For posterity, in case anybody follows this thread, I tracked the problem down to WordDelimiterFilter; apparently it creates an offset of -1 in some case, which PostingsHighlighter rejects. -Mike On 5/2/2014 10:20 AM, Michael Sokolov wrote: I checked using the analysis admin page, and I

Re: PostingHighlighter complains about no offsets

2014-05-02 Thread Michael Sokolov
I checked using the analysis admin page, and I believe there are offsets being generated (I assume start/end=offsets). So IDK I am going to try reindexing again. Maybe I neglected to reload the config before I indexed last time. -Mike On 05/02/2014 09:34 AM, Michael Sokolov wrote: I&#x

PostingHighlighter complains about no offsets

2014-05-02 Thread Michael Sokolov
I've been wanting to try out the PostingsHighlighter, so I added storeOffsetsWithPositions to my field definition, enabled the highlighter in solrconfig.xml, reindexed and tried it out. When I issue a query I'm getting this error: |field 'text' was indexed without offsets, cannot highlight

Re: facet.field counts when q includes field

2014-04-27 Thread Michael Sokolov
On 4/27/14 7:02 PM, Michael Sokolov wrote: On 4/27/2014 6:30 PM, Trey Grainger wrote: So my question basically is: which restrictions are applied to the docset from which (field) facets are computed? Facets are generated based upon values found within the documents matching your &q

Re: facet.field counts when q includes field

2014-04-27 Thread Michael Sokolov
On 4/27/2014 6:30 PM, Trey Grainger wrote: So my question basically is: which restrictions are applied to the docset from which (field) facets are computed? Facets are generated based upon values found within the documents matching your "q=" parameter and also all of your "fq=" parameters. Basi

facet.field counts when q includes field

2014-04-27 Thread Michael Sokolov
I'm trying to understand the facet counts I'm getting back from Solr when the main query includes a term that restricts on a field that is being faceted. After reading the docs on the wiki (both wikis) I'm confused. In my little test dataset, if I facet on "type" and use q=*:*, I get facet c

Re: Solr How to sorting suggestions by sales

2014-04-19 Thread Michael Sokolov
The ordering at the lowest level in Lucene is controlled based on an arbitrary weighting factor: I believe the only option you have at the Solr level is to order by term value (eg alphabetically), or by term frequency. You could do this by creating a field with all of your "sales" - if you cre

Re: Can I reconstruct text from tokens?

2014-04-18 Thread Michael Sokolov
I believe you could use term vectors to retrieve all the terms in a document, with their offsets. Retrieving them from the inverted index would be expensive since the index is term-oriented, not document-oriented. Without tv, I think you essentially have to scan the entire term dictionary loo

  1   2   3   >