Re: Solr suggest is related to second letter, not to initial letter

2015-02-15 Thread Michael Sokolov
StandardTokenizer splits your text into tokens, and the suggester suggests tokens independently. It sounds as if you want the suggestions to be based on the entire text (not just the current word), and that only adjacent words in the original should appear as suggestions. Assuming that's what

Re: Solr suggest is related to second letter, not to initial letter

2015-02-18 Thread Michael Sokolov
On 02/17/2015 03:46 AM, Volkan Altan wrote: First of all thank you for your answer. You're welcome - thanks for sending a more complete example of your problem and expected behavior. I don’t want to use KeywordTokenizer. Because, as long as the compound words written by the user are availabl

Re: highlighting the boolean query

2015-02-24 Thread Michael Sokolov
There is also PostingsHighlighter -- I recommend it, if only for the performance improvement, which is substantial, but I'm not completely sure how it handles this issue. The one drawback I *am* aware of is that it is insensitive to positions (so words from phrases get highlighted even in isol

Re: Querying XML

2014-03-14 Thread Michael Sokolov
Yes, Lux automatically indexes text in XML elements associated with their element names so you can run efficient XPath/XQuery queries; in your case I would write: q=/MainData/Info/Info[@name="Bob"][city="Cincinnati"] or q=//Info[@name="Bob"][city="Cincinnati"] It also let's you mix "regular"

example schema now stores most field values

2014-03-15 Thread Michael Sokolov
While upgrading from 4.2.1 to 4.6.1 I noticed that many of the fields defined in the example schema.xml that used to be indexed and not stored are now defined as indexed and stored. Is there anything behind this change other than the idea that it would be more convenient to have all the values

Re: example schema now stores most field values

2014-03-15 Thread Michael Sokolov
at 1:02 PM, Michael Sokolov wrote: While upgrading from 4.2.1 to 4.6.1 I noticed that many of the fields defined in the example schema.xml that used to be indexed and not stored are now defined as indexed and stored. Is there anything behind this change other than the idea that it would be more

Re: example schema now stores most field values

2014-03-16 Thread Michael Sokolov
upansky -----Original Message- From: Michael Sokolov Sent: Saturday, March 15, 2014 1:02 PM To: solr-user@lucene.apache.org Subject: example schema now stores most field values While upgrading from 4.2.1 to 4.6.1 I noticed that many of the fields defined in the example schema.xml that used to

Re: example schema now stores most field values

2014-03-17 Thread Michael Sokolov
ly, it doesn't seem to be working. (Anonymous - via GTD book) On Sun, Mar 16, 2014 at 9:28 PM, Michael Sokolov wrote: Thanks for hunting that down, Jack. It may very well have been a change that we made (to remove the stored="true". Sorry if I led you on a wild goose chase. A

Re: Solr4.7 No live SolrServers available to handle this request

2014-03-20 Thread Michael Sokolov
I'm getting a similar exception when writing documents (on the client side). I can write one document fine, but the second (which is being routed to a different shard) generates the error. It happens every time - definitely not a resource issue or timing problem since this database is complet

Re: Solr4.7 No live SolrServers available to handle this request

2014-03-21 Thread Michael Sokolov
an error at the same time and put it on pastebin or the like. Thanks, Greg On Mar 20, 2014, at 3:36 PM, Michael Sokolov wrote: I'm getting a similar exception when writing documents (on the client side). I can write one document fine, but the second (which is being routed to a differ

Re: Solr4.7 No live SolrServers available to handle this request

2014-03-22 Thread Michael Sokolov
Excellent, thanks Shalin! On 3/22/2014 3:32 PM, Shalin Shekhar Mangar wrote: Thanks Michael! I just committed your fix. It will be released with 4.7.1 On Fri, Mar 21, 2014 at 8:30 PM, Michael Sokolov wrote: I just managed to track this down -- as you said the disconnect was a red herring

Re: setting up solr on tomcat

2014-03-23 Thread Michael Sokolov
On 3/22/2014 2:16 AM, anupamk wrote: Hi, Is the solrTomcat wiki article valid for solr-4.7.0 ? http://wiki.apache.org/solr/SolrTomcat I am not able to deploy solr after following the instructions there. When I try to access the solr admin page I get a 404. I followed every step exactly as me

Re: tf and very short text fields

2014-04-03 Thread Michael Sokolov
On 4/1/14 2:32 PM, Walter Underwood wrote: And here is another peculiarity of short text fields. The movie "New York, New York" should not be twice as relevant for the query "new york". Is there a way to use a binary term frequency rather than a count? wunder -- Walter Underwood wun...@wunderw

Re: tf and very short text fields

2014-04-03 Thread Michael Sokolov
On 4/3/14 7:46 AM, Michael Sokolov wrote: On 4/1/14 2:32 PM, Walter Underwood wrote: And here is another peculiarity of short text fields. The movie "New York, New York" should not be twice as relevant for the query "new york". Is there a way to use a binary term freq

Re: Distributed tracing for Solr via adding HTTP headers?

2014-04-07 Thread Michael Sokolov
I had to grapple with something like this problem when I wrote Lux's app-server. I extended SolrDispatchFilter and handle parameter swizzling to keep everything nicey-nicey for Solr while being able to play games with parameters of my own. Perhaps this will give you some ideas: https://gith

Re: Distributed tracing for Solr via adding HTTP headers?

2014-04-07 Thread Michael Sokolov
getParameterNames before SolrDispatchFilter has a chance to access the InputStream. I opened https://issues.apache.org/jira/browse/SOLR-5969 to discuss further and attached our current patch. On Mon, Apr 7, 2014 at 2:02 PM, Michael Sokolov < msoko...@safaribooksonline.com> wrote: I had to grappl

multiple analyzers for one field

2014-04-09 Thread Michael Sokolov
I think I would like to do something like copyfield from a bunch of fields into a single field, but with different analysis for each source, and I'm pretty sure that's not a thing. Is there some alternate way to accomplish my goal? Which is to have a suggester that suggests words from my full

Re: multiple analyzers for one field

2014-04-10 Thread Michael Sokolov
Thanks Mike On 4/9/2014 4:16 PM, Michael Sokolov wrote: I think I would like to do something like copyfield from a bunch of fields into a single field, but with different analysis for each source, and I'm pretty sure that's not a thing. Is there some alternate way to accomplish my goal?

Re: multiple analyzers for one field

2014-04-10 Thread Michael Sokolov
, Apr 11, 2014 at 8:05 AM, Michael Sokolov wrote: The lack of response to this question makes me think that either there is no good answer, or maybe the question was too obtuse. So I'll give it one more go with some more detail ... My main goal is to implement autocompletion with a mix of

Re: multiple analyzers for one field

2014-04-10 Thread Michael Sokolov
hin a field based upon multiple inputs. All the best, Trey Grainger Co-author, Solr in Action Director of Engineering, Search & Analytics @ CareerBuilder On Thu, Apr 10, 2014 at 9:05 PM, Michael Sokolov < msoko...@safaribooksonline.com> wrote: The lack of response to this question makes m

[ANN] Solr learning resources on safariflow.com (w/subscription or free trial)

2014-04-11 Thread Michael Sokolov
I just wanted to let people know about some recent Solr books and videos that are now available at safariflow.com. You can sign up for a free trial and get instant access, buy a subscription, or you may already be a subscriber. I don't normally send out announcements like this, but because we

Re: Strange double-logging with log4j

2014-04-13 Thread Michael Sokolov
I've had this happen to me before too; it's always a mystery. I wonder if it has to do with specifying the "file" appender for both rootLogger and solrj? -Mike On 4/12/2014 5:20 PM, Shawn Heisey wrote: On 4/11/2014 3:21 PM, Shawn Heisey wrote: This is lucene_solr_4_7_2_r1586229, downloaded

Re: [ANN] Solr learning resources on safariflow.com (w/subscription or free trial)

2014-04-13 Thread Michael Sokolov
usage/statistics too. To know which chapters of my book were most useful/recommended. Regards, Alex On 11/04/2014 8:45 pm, "Michael Sokolov" wrote: I just wanted to let people know about some recent Solr books and videos that are now available at safariflow.com. You can sign up for a

Re: multiple analyzers for one field

2014-04-14 Thread Michael Sokolov
I lost the original thread; sorry for the new / repeated topic, but thought I would follow up to let y'all know that I ended up implementing Alex's idea to implement an UpdateRequestProcessor in order to apply different analysis to different fields when doing something analogous to copyFields.

Re: multiple analyzers for one field

2014-04-15 Thread Michael Sokolov
mine. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Tue, Apr 15, 2014 at 8:52 AM, Michael Sokolov wrote: I lost the original thread; sorry for the new / repeated topic, but thought I would follow up to

Re: multiple analyzers for one field

2014-04-15 Thread Michael Sokolov
experience thus sounds like either two or no blog posts. I certainly have killed a bunch of good articles by waiting for perfection:-) On 15/04/2014 7:01 pm, "Michael Sokolov" wrote: A blog post is a great idea, Alex! I think I should wait until I have a complete end-to-end implem

multi-field suggestions

2014-04-18 Thread Michael Sokolov
I've been working on getting AnalyzingInfixSuggester to make suggestions using tokens drawn from multiple fields. I've done this by copying tokens from each of those fields into a destination field, and building suggestions using that destination field. This allows me to use different analysi

Re: Can I reconstruct text from tokens?

2014-04-18 Thread Michael Sokolov
I believe you could use term vectors to retrieve all the terms in a document, with their offsets. Retrieving them from the inverted index would be expensive since the index is term-oriented, not document-oriented. Without tv, I think you essentially have to scan the entire term dictionary loo

Re: Solr How to sorting suggestions by sales

2014-04-19 Thread Michael Sokolov
The ordering at the lowest level in Lucene is controlled based on an arbitrary weighting factor: I believe the only option you have at the Solr level is to order by term value (eg alphabetically), or by term frequency. You could do this by creating a field with all of your "sales" - if you cre

facet.field counts when q includes field

2014-04-27 Thread Michael Sokolov
I'm trying to understand the facet counts I'm getting back from Solr when the main query includes a term that restricts on a field that is being faceted. After reading the docs on the wiki (both wikis) I'm confused. In my little test dataset, if I facet on "type" and use q=*:*, I get facet c

Re: facet.field counts when q includes field

2014-04-27 Thread Michael Sokolov
On 4/27/2014 6:30 PM, Trey Grainger wrote: So my question basically is: which restrictions are applied to the docset from which (field) facets are computed? Facets are generated based upon values found within the documents matching your "q=" parameter and also all of your "fq=" parameters. Basi

Re: facet.field counts when q includes field

2014-04-27 Thread Michael Sokolov
On 4/27/14 7:02 PM, Michael Sokolov wrote: On 4/27/2014 6:30 PM, Trey Grainger wrote: So my question basically is: which restrictions are applied to the docset from which (field) facets are computed? Facets are generated based upon values found within the documents matching your &q

PostingHighlighter complains about no offsets

2014-05-02 Thread Michael Sokolov
I've been wanting to try out the PostingsHighlighter, so I added storeOffsetsWithPositions to my field definition, enabled the highlighter in solrconfig.xml, reindexed and tried it out. When I issue a query I'm getting this error: |field 'text' was indexed without offsets, cannot highlight

Re: PostingHighlighter complains about no offsets

2014-05-02 Thread Michael Sokolov
I checked using the analysis admin page, and I believe there are offsets being generated (I assume start/end=offsets). So IDK I am going to try reindexing again. Maybe I neglected to reload the config before I indexed last time. -Mike On 05/02/2014 09:34 AM, Michael Sokolov wrote: I&#x

Re: PostingHighlighter complains about no offsets

2014-05-03 Thread Michael Sokolov
For posterity, in case anybody follows this thread, I tracked the problem down to WordDelimiterFilter; apparently it creates an offset of -1 in some case, which PostingsHighlighter rejects. -Mike On 5/2/2014 10:20 AM, Michael Sokolov wrote: I checked using the analysis admin page, and I

Re: PostingHighlighter complains about no offsets

2014-05-03 Thread Michael Sokolov
on lucene 4.8? https://issues.apache.org/jira/plugins/servlet/mobile#issue/LUCENE-5111 Michael Sokolov schreef:For posterity, in case anybody follows this thread, I tracked the problem down to WordDelimiterFilter; apparently it creates an offset of -1 in some case, which PostingsHighlighter rejects. -M

Re: Use XSD or DTD to make Solr schema?

2014-05-06 Thread Michael Sokolov
I'm pretty sure there's nothing to automate that task, but there are some tools to help with indexing XML. Lux (http://luxdb.org) is one; it can index all the element text and attribute values, effectively creating an index for each tag name -- these are not specifically Solr/Lucene fields, bu

Re: Can't use 2 highlighting components in the same solrconfig

2014-05-06 Thread Michael Sokolov
I don't know what the design was, but your use case seems valid to me: I think you should submit a ticket and a patch. If you write a test, I suppose it might be more likely to get accepted. -Mike On 5/6/2014 10:59 AM, Cario, Elaine wrote: I experimented locally with modifying the SolrCore c

Re: Website running Solr

2014-05-15 Thread Michael Sokolov
On 5/11/2014 12:55 PM, Olivier Austina wrote: Hi All, Is there a way to know if a website use Solr? Thanks. Regards Olivier Ask the people who run the site?

AnalyzingInfixLookupFactory with multiple cores

2014-05-15 Thread Michael Sokolov
It seems as if the location of the suggester dictionary directory is not core-specific, so when the suggester is defined for multiple cores, they collide: you get exceptions attempting to obtain the lock, and the suggestions bleed from one core to the other. There is an (undocumented) "indexP

Re: AnalyzingInfixLookupFactory with multiple cores

2014-05-16 Thread Michael Sokolov
Thanks Dmitry! On 05/15/2014 07:54 AM, Dmitry Kan wrote: Hi Mike, The core name can be accessed via: ${solr.core.name} in solrconfig.xml (verified in a solr replication config). HTH, Dmitry On Fri, May 9, 2014 at 4:07 PM, Michael Sokolov < msoko...@safaribooksonline.com> wrote: It

Re: Solr 4.8: Does eDisMax parser calls analyzer chain to tokenize?

2014-05-17 Thread Michael Sokolov
Alex - the query parsers generally accept an analyzer, which they must apply after they perform their own tokenization. Consider: how would a capitalized query term match lower-cased terms in the index without query analysis? -Mike On 5/17/2014 4:05 AM, Alexandre Rafalovitch wrote: Hello,

Re: Uneven shard heap usage

2014-05-31 Thread Michael Sokolov
Is it possible that all your requests are routed to that single shard? I.e. you are not using the smart client that round-robins requests? I think that could cause all of the merging of results to be done on a single node. Also - is it possible you have a "bad" document in that shard? Like o

Re: Uneven shard heap usage

2014-06-02 Thread Michael Sokolov
Joe - there shouldn't really be a problem *indexing* these fields: remember that all the terms are spread across the index, so there is really no storage difference between one 180MB document and 180 1 MB documents from an indexing perspective. Making the field "stored" is more likely to lead

Re: Uneven shard heap usage

2014-06-02 Thread Michael Sokolov
this shard Best, Erick On Mon, Jun 2, 2014 at 4:27 AM, Michael Sokolov < msoko...@safaribooksonline.com> wrote: Joe - there shouldn't really be a problem *indexing* these fields: remember that all the terms are spread across the index, so there is really no storage diffe

Re: OOV queries

2014-06-05 Thread Michael Sokolov
It seems as if 0-hit queries should be pretty fast since they can terminate very early? Are you seeing a big difference between first-time and subsequent (cached) no-match queries? -Mike On 6/5/2014 8:47 AM, Dmitry Kan wrote: Hi, Solr is good at caching: even if first "cold" query takes lo

Re: Solr - what's the next big thing?

2013-10-29 Thread Michael Sokolov
On 10/26/2013 8:31 PM, Bill Bell wrote: Full JSON support deep complex object indexing and search Game changer Bill Bell Sent from mobile Not JSON (yet?) but take a look at http://luxdb.org which does XML indexing and search. We index all the text of all the nodes in your tree: no nee

[ANN] Lux release 0.11.2

2013-11-05 Thread Michael Sokolov
I'm pleased to announce the release of Lux, version 0.11.2, the Dublin edition. There have been the usual round of bug fixes and enhancements, but the main news with this release is the inclusion of support for SolrCloud. You can now store and search XML documents in a distributed index using

Re: character encoding issue...

2013-11-10 Thread Michael Sokolov
Don't feel bad: character encoding problems are often said to be among the hardest in software engineering. There's no simple answer to problems like this since as Erick said, any tool in your chain could be the culprit. I doubt anyone on this list will be able to guess "the answer" since the

Re: Thought exercise: features for Solr client

2013-11-14 Thread Michael Sokolov
I think there is a place for a client-side query hierarchy. It would be nice if you could build a Lucene Query and the Solr client would serialize it for you. If there were a general-purpose query serialization library then you could support a similar programming model for Lucene-only and wit

Re: distributed search is significantly slower than direct search

2013-11-16 Thread Michael Sokolov
Did you say what the memory profile of your machine is? How much memory, and how large are the shards? This is just a random guess, but it might be that if you are memory-constrained, there is a lot of thrashing caused by paging (swapping?) in and out the sharded indexes while a single index c

getting matching term count for a query

2013-11-18 Thread Michael Sokolov
Some of our customers want to display a "number of matches" score next to each search result. I think what they want is to list the number of matches that will be displayed when the entire document is highlighted. But this can be slow to do for every search result (some documents can be very

Re: getting matching term count for a query

2013-11-18 Thread Michael Sokolov
OK -- I did find SOLR-1298 <https://issues.apache.org/jira/browse/SOLR-1298>which explains how to request the function as a field value. Still looking for a function that does what I asked for ... <https://issues.apache.org/jira/browse/SOLR-1298> On 11/18/2013 11:55 AM, Michael S

NPE in function query, was: Re: getting matching term count for a query

2013-11-18 Thread Michael Sokolov
return new SumFloatFunction(termcounts); } } On 11/18/13 2:19 PM, Michael Sokolov wrote: OK -- I did find SOLR-1298 <https://issues.apache.org/jira/browse/SOLR-1298>which explains how to request the function as a field value. Still looking for a function that does what I asked for ... On 11

Re: NPE in function query, was: Re: getting matching term count for a query

2013-11-18 Thread Michael Sokolov
t(); for (Term t : terms) { if (fields.isEmpty() || fields.contains (t.field())) { termcounts.add (new TermFreqValueSource(t.field(), t.text(), t.field(), t.bytes())); } } return new SumFloatFunction(termcounts.toArray(new ValueSource[termcounts.size()])); } } On 11/18/13 8:38 PM, Michael Sokolov wrote:

Re: How to index X™ as ™ (HTML decimal entity)

2013-11-21 Thread Michael Sokolov
ny case. -- Jack Krupansky -Original Message- From: Michael Sokolov Sent: Thursday, November 21, 2013 8:56 AM To: solr-user@lucene.apache.org Subject: Re: How to index X™ as ™ (HTML decimal entity) I have to agree w/Walter. Use unicode as a storage format. The entity encodings are for trans

Re: How to index X™ as ™ (HTML decimal entity)

2013-11-21 Thread Michael Sokolov
I have to agree w/Walter. Use unicode as a storage format. The entity encodings are for transfer/interchange. Encode/decode on the way in and out if you have to. Would you store "a" as "A" ? It makes it impossible to search for, for one thing. What if someone wants to search for the TM ch

Revolution writeup

2013-11-25 Thread Michael Sokolov
I just posted a writeup of the Lucene/Solr Revolution Dublin conference. I've been waiting for videos to become available, but I got impatient. Slides are there, mostly though. Sorry if I missed your talk -- I'm hoping to catch up when the videos are posted... http://blog.safariflow.com/201

Re: post filtering for boolean filter queries

2013-12-03 Thread Michael Sokolov
On 12/03/2013 01:55 AM, Dmitry Kan wrote: Hello! We have been experimenting with post filtering lately. Our setup is a filter having long boolean query; drawing the example from the Dublin's Stump the Chump: fq=UserId:(user1 OR user2 OR...OR user1000) The underlining issue impacting performanc

Re: Indexing on plain text and binary data in a single HTTP POST request

2013-12-09 Thread Michael Sokolov
On 12/9/2013 11:13 PM, neerajp wrote: Hi, Pls. find my response in-line: That said, the obvious alternative is to use /update/extract instead of /update – this gives you a way of handling up to one binary stream in addition to any number of fields that can be represented as text. In that case, y

Re: Tracking down the input that hits an analysis chain bug

2014-01-03 Thread Michael Sokolov
Have you considered using a custom UpdateProcessor to catch the exception and provide more context in the logs? -Mike On 01/03/2014 03:33 PM, Benson Margulies wrote: Robert, Yes, if the problem was not data-dependent, indeed I wouldn't need to index anything. However, I've run a small mountai

Re: Tracking down the input that hits an analysis chain bug

2014-01-05 Thread Michael Sokolov
t up OK, I think you get the insert messages at INFO level? -Mike On 1/4/2014 9:24 PM, Benson Margulies wrote: I rather assumed that there was some log4j-ish config to be set that would do this for me. Lacking one, I guess I'll end up there. On Fri, Jan 3, 2014 at 8:23 PM, Michael Sokolo

Re: MergePolicy for append-only indices?

2014-01-06 Thread Michael Sokolov
I think the key optimization when there are no deletions is that you don't need to renumber documents and can bulk-copy blocks of contiguous documents, and that is independent of merge policy. I think :) -Mike On 01/06/2014 01:54 PM, Shawn Heisey wrote: On 1/6/2014 11:24 AM, Otis Gospodnetic

Re: Where is a canonical SolrJ example(s)?

2014-01-28 Thread Michael Sokolov
On 01/28/2014 11:55 AM, Alexandre Rafalovitch wrote: As to ESS, like I mentioned, the classpath issue seem to be quite a challenge. Again, perhaps not something that shows up during the testing because the directory layout during testing is rather different from the end-user's layout. I'm not s

Re: Where is a canonical SolrJ example(s)?

2014-01-29 Thread Michael Sokolov
, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Jan 29, 2014 at 5:35 AM,

Re: block join and atomic updates

2014-02-19 Thread Michael Sokolov
Maybe he can use updateable docvalues (LUCENE-5189)? I heard that was a thing. Has it made its way into Solr in some way? -Mike On 2/19/2014 4:23 AM, Mikhail Khludnev wrote: Just a side note. Sidecar index might be really useful for updating blocked docs, but it's in experimenting stage iirc

Re: Solr is NoSQL database or not?

2014-03-02 Thread Michael Sokolov
On 3/1/2014 6:53 PM, Jack Krupansky wrote: NoSQL? To me it's just a marketing term, like Big Data. Data store? That does imply support for persistence, as opposed to mere caching, but mere persistence doesn't assure that the store is suitable for use as a System of Record which is a requiremen

Re: Solr Heap, MMaps and Garbage Collection

2014-03-03 Thread Michael Sokolov
On 3/3/2014 1:54 AM, KNitin wrote: 3. 2.8 Gb - Perm Gen (I am guessing this is because of interned strings) As others have pointed out, this is really unusual for Solr. We often see high permgen in our app servers due to dynamic class loading that the framework performs; maybe you are somehow

Re: SOLRJ and SOLR compatibility

2014-03-04 Thread Michael Sokolov
Does that mean newer clients work with older servers (I think so, from reading this thread), or the other way round? If so, I guess the advice would be -- upgrade all your clients first? -Mike On 03/04/2014 10:00 AM, Mark Miller wrote: Yeah, sorry :( the fix applied is only for compatibil

Re: SOLRJ and SOLR compatibility

2014-03-04 Thread Michael Sokolov
ld client -> new server or new client -> old server), but we as a community need to pick one and build out the test suites that ensure SolrJ compatibility with different versions. Timothy Potter Sr. Software Engineer, LucidWorks www.lucidworks.com ________ F

Re: SOLRJ and SOLR compatibility

2014-03-05 Thread Michael Sokolov
On 3/5/2014 1:36 AM, Shawn Heisey wrote: On 3/4/2014 8:15 PM, Michael Sokolov wrote: Thanks, Tim, it's great to hear you say that! I tried to make that point myself with various patches, but they never really got taken up by committers, so I kind of gave up, but I agree with you 100% this

update external file

2014-10-23 Thread Michael Sokolov
I've been looking at ExternalFileField to handle popularity boosting. Since Solr updatable docvalues (SOLR-5944) isn't quite there yet. My question is whether there is any support for uploading the external file via Solr, or if people do that some other (external, I guess) way? -Mike

Re: update external file

2014-10-23 Thread Michael Sokolov
Thanks for the links, Ramzi. I had already read the wiki page, which merely talks about how to reload the file into memory once it has been updated on disk. It doesn't mention any support for uploading that I can see. Did I miss it? -Mike On 10/23/14 1:36 PM, Ramzi Alqrainy wrote: Of cour

Re: update external file

2014-10-23 Thread Michael Sokolov
That's what I thought; thanks, Markus. On 10/23/14 2:19 PM, Markus Jelsma wrote: You either need to upload them and issue the reload command, or download them from the machine, and then issue the reload command. There is no REST support for it (yet) like the synonym filter, or was it stop filt

Re: recip function error

2014-10-23 Thread Michael Sokolov
3.16e-11.0 looks fishy to me On 10/23/14 5:09 PM, eShard wrote: Good evening, I'm using solr 4.0 Final. I tried using this function boost=recip(ms(NOW/HOUR,startdatez,3.16e-11.0,0.08,0.05)) but it fails with this error: org.apache.lucene.queryparser.classic.ParseException: Expected ')' at posi

Re: AW: AW: (auto)suggestions, but ony from a "filtered" set of documents

2014-10-26 Thread Michael Sokolov
This project (https://github.com/safarijv/ifpress-solr-plugin/) has some examples of custom Solr UpdateRequestProcessors that feed a single suggester from multiple fields, applying different weights to them, using complete values from some and analyzing others into tokens. The first thing I di

Re: AW: AW: AW: (auto)suggestions, but ony from a "filtered" set of documents

2014-10-27 Thread Michael Sokolov
really offer a solution to your problem, but there are some possibly helpful similarities: you will probably want to write a custom UpdateRequestProcessor, and you will want to feed the suggester with a custom Dictionary / InputIterator as I have done in that example. -Mike -Clemens -U

function results' names include trailing whitespace

2014-10-29 Thread Michael Sokolov
I noticed that when you include a function as a result field, the corresponding key in the result markup includes trailing whitespace, which seems like a bug. I wonder if anyone knows if there is a ticket for this already? Example: fl="id field(units_used) archive_id" ends up returning resu

Re: function results' names include trailing whitespace

2014-10-29 Thread Michael Sokolov
OK, I opened SOLR-6672; not sure how I stumbled into using white space; I would ordinarily use commas too, I think. -Mike On 10/29/14 1:23 PM, Chris Hostetter wrote: : fl="id field(units_used) archive_id" I didn't even realize until today that fl was documented to support space seperated fiel

Re: dynamically change default update chain

2014-11-03 Thread Michael Sokolov
Just to get the obvious sledgehammer solution out of the way - upload a new, edited solrconfig.xml with the default changed, and reload the core. -Mike On 11/3/14 6:28 AM, Dmitry Kan wrote: Hello solr fellows, I'm working on a project that involves using two update chains. One default chain

Re: Missing log entries with log4j log rotation

2014-11-04 Thread Michael Sokolov
Shawn this is really weird -- we run log4j in lots of installations and have never seen an issue like this. I wonder if you might be running some other log rotation software (like logrotate) that is somehow getting in the way or conflicting? -Mike On 11/01/2014 01:45 PM, Shawn Heisey wrote:

Re: Is there a way to stop some hyphenated terms from being tokenized

2014-11-05 Thread Michael Sokolov
You didn't describe your analysis chain, but maybe you are using WordDelimiterFilter to break up hyphenated words? If so, it has a protwords.txt feature that lets you specify exceptions -Mike On 11/5/2014 5:36 PM, Michael Della Bitta wrote: Pretty sure what you need is called KeywordMarkerFil

Re: Best practice: Autosuggest/autocomplete vs. "real search"

2014-11-10 Thread Michael Sokolov
The goal is to ensure that suggestions from autocomplete are actually terms in the main index, so that the suggestions will actually result in matches. You've considered expanding the main index by adding the suggestion n-grams to it, but it would probably be better to alter your suggester so

Re: How to suggest from multiple fields?

2014-11-11 Thread Michael Sokolov
The usual approach is to use copyField to copy multiple fields to a single field. I posted a solution using an UpdateRequestProcessor to merge fields, but with different analyzers, here: https://blog.safaribooksonline.com/2014/04/15/search-suggestions-with-solr-2/ My latest approach is this:

Re: DIH Blob data

2014-11-12 Thread Michael Sokolov
We routinely store images and pdfs in Solr. There *is* a benefit, since you don't need to manage another storage system, you don't have to worry about Solr getting out of sync with the other system, you can use Solr replication for all your assets, etc. I don't use DIH, so personally I don't c

Re: Suggest dictionaries not rebuilding after restart

2014-11-13 Thread Michael Sokolov
I believe the spellchecker component persists these indexes now and reloads them on restart rather than rebuilding. -Mike On 11/13/14 7:40 PM, Walter Underwood wrote: We have to manually rebuild the suggest dictionaries after a restart. This seems odd, since someone else had a problem because

Re: Suggest dictionaries not rebuilding after restart

2014-11-14 Thread Michael Sokolov
4/14 2:01 AM, Walter Underwood wrote: We get no suggestions until we force a build with suggest.build=true. Maybe we need to define a spellchecker component to get that behavior? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Nov 13, 2014, at 10:56 PM, Michael

Re: DIH Blob data

2014-11-14 Thread Michael Sokolov
an use filter query like "fq=terms:a:1" 2014. 11. 13. 오전 3:59에 "Michael Sokolov" 님이 작성: We routinely store images and pdfs in Solr. There *is* a benefit, since you don't need to manage another storage system, you don't have to worry about Solr getting out of sync with

Re: DIH Blob data

2014-11-14 Thread Michael Sokolov
On 11/14/2014 01:43 PM, Erick Erickson wrote: Just skimming, so maybe I misinterpreted. ExternalFileField and ExternalFileFieldReloader refer to storing values for each doc in an external file, they have nothing to do with storing _files_. The usual pattern is to have Solr store just enough da

Re: Suggest dictionaries not rebuilding after restart

2014-11-14 Thread Michael Sokolov
Mike On 11/14/14 2:01 AM, Walter Underwood wrote: We get no suggestions until we force a build with suggest.build=true. Maybe we need to define a spellchecker component to get that behavior? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Nov 13, 2014, at

Re: problems when hunspell returns multiple stems

2014-11-18 Thread Michael Sokolov
nerating multiple "stems" causes issues On 11/18/2014 02:33 PM, Michael Sokolov wrote: I find that a query for stemmed terms sometimes fails with the edismax query parser and hunspell stemmer. Looklng at the output of analysis for the query (text:following) I can see that it generates two

problems when hunspell returns multiple stems

2014-11-18 Thread Michael Sokolov
I find that a query for stemmed terms sometimes fails with the edismax query parser and hunspell stemmer. Looklng at the output of analysis for the query (text:following) I can see that it generates two different terms at the same position: "follow" and "following". Then edismax seems to genera

Re: problems when hunspell returns multiple stems

2014-11-18 Thread Michael Sokolov
OK - please disregard; I found a rogue new component in our analyzer that was messing everything up. The hunspell behavior was perhaps a little confusing, but I don't believe it leads to broken queries. -Mike On 11/18/2014 02:38 PM, Michael Sokolov wrote: followup - hunspell has: f

Re: Handling intersection facets of many values

2014-11-20 Thread Michael Sokolov
If you're willing to write some Java you can do something more efficient by intersecting two terms enumerations: this works with constant memory for any number of values in two fields, basically like intersecting any two sorted lists, you leap frog between them. I have an example if you're int

Re: Error while initializing EmbeddedSolrServer

2014-11-23 Thread Michael Sokolov
Those Spi classes rely on a configuration file that gets stored in the META-INF folder. I'm not familiar with who OSGI works, but I'm pretty sure that failure is because the file META-INF/services/org.apache.lucene.codecs.Codec (you'll see it in the lucene-core jar) can't be found -Mike On

Re: matching shingles issue

2014-11-24 Thread Michael Sokolov
maybe try description_shingle:(Highest quality) On 11/24/14 1:46 PM, vit wrote: I have Solr 4.2.1 I am using the following analyser:

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Michael Sokolov
The index size will not increase as quickly as you might think, and is not an issue in most cases. An alternative to two fields, though, is to index both upper- and lower-case tokens at the same position in a single field, and then to perform no case folding at query time. There is no standar

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Michael Sokolov
right -- missed Ahmet's answer there in my haste to respond ... -Mike On 11/25/14 6:56 AM, Ahmet Arslan wrote: Hi Apurv, I wouldn't worry about index size, increase in index size is not linear (2x) like that. Please see similar discussion : https://issues.apache.org/jira/browse/LUCENE-5620 A

Re: Fwd: Change in the Score of Similiar Documents

2014-11-25 Thread Michael Sokolov
Scores are related to total term frequencies *in each shard*, not globally, and I think they may include term counts from deleted documents as well, which could account for the discrepancy in scores across the two shards. -Mike On 11/25/14 3:22 AM, rashi gandhi wrote: Hi, I have created t

Re: updateNumericDocValue in solr 4.6.1

2014-11-26 Thread Michael Sokolov
Yes - here's a working example we have in production (tested in 4.8.1 and 4.10.2, but the underlying lucene stuff hasn't changed since 4.6.1 I'm pretty sure): https://github.com/safarijv/ifpress-solr-plugin/blob/master/src/main/java/com/ifactory/press/db/solr/processor/UpdateDocValuesProcessor.

  1   2   3   >