Re: [POLL] How do you (like to) do logging with Solr

2011-05-17 Thread Michael Sokolov
On 5/16/2011 7:50 PM, Chris Hostetter wrote: : This poll is to investigate how you currently do or would like to do : logging with Solr when deploying solr.war to a SEPARATE java application : server (such as Tomcat, Resin etc) outside of the bundled FWIW... a) the context of this poll is SOLR-

Re: Field collapsing on multiple fields and/or ranges?

2011-05-18 Thread Michael McCandless
As far as I know this is not possible today with either Solr's 4.0 grouping impl or the new grouping module (soon to be grouping in Solr 3.x). I'm not sure about the patch on SOLR-236 though. But it's an interesting use case; it's a compound group key, right? You want to group by a tuple (X, Y).

Re: Field collapsing on multiple fields and/or ranges?

2011-05-18 Thread Michael McCandless
Start here: https://issues.apache.org/jira/browse/LUCENE Create an account (it's free), open an issue and set the component to "modules/grouping", fill in the fields, and submit it :) Then maybe make a patch and attach it! Genericizing the per-doc grouping key is important; we have an issue open

Re: Field collapsing on multiple fields and/or ranges?

2011-05-18 Thread Michael McCandless
Ahh, that's because you opened a Solr not a Lucene issue ;) The "modules" (incl. new grouping module) are under Lucene. That's fine, we can leave it as a Solr issue. Mike http://blog.mikemccandless.com On Wed, May 18, 2011 at 4:10 PM, arian487 wrote: > https://issues.apache.org/jira/browse/SO

Re: Fuzzy search and solr 4.0

2011-05-19 Thread Michael McCandless
Well the good news is FuzzyQuery is indeed much faster in Lucene/Solr 4.0. But the bad news is... FuzzyQuery won't do what you need here. You need some sort of FuzzyPhraseQuery, which is able to replace terms similar to one another (comp/company/corporation) by some metric. I don't know of s

Re: chinese SOLR query parser

2011-05-21 Thread Michael McCandless
Unfortunately, Solr's defaults (example schema) are unusable for non-whitespace languages... see: http://markmail.org/thread/ww6mhfi3rfpngmc5 So it could be you need to turn off autoGeneratePhraseQueries in your fieldType? We are working towards fixing the example schema (for 3.2/4.0) in htt

Re: chinese SOLR query parser

2011-05-23 Thread Michael McCandless
> > Thanks > > > --- On Sat, 5/21/11, Michael McCandless wrote: > >> From: Michael McCandless >> Subject: Re: chinese SOLR query parser >> To: solr-user@lucene.apache.org >> Date: Saturday, May 21, 2011, 6:14 PM >> Unfortunately, Solr's

Re: problem in setting field attribute in schema.xml

2011-05-25 Thread Michael Lackhoff
ecify field:value, it shows 0 results. Can anyone explain? I guess you copy the field to your default search field. -Michael

Re: problem in setting field attribute in schema.xml

2011-05-26 Thread Michael Lackhoff
t you didn't get a result here if you didn't index "field". -Michael

Re: problem in setting field attribute in schema.xml

2011-05-26 Thread Michael Lackhoff
y the purpose of "indexed" and "stored". -Michael

Re: Nested grouping/field collapsing

2011-05-27 Thread Michael McCandless
Can you open a Lucene issue (against the new grouping module) for this? I think this is a compelling use case that we should try to support. In theory, with the "general" two-pass grouping collector, this should be possible, but will require three passes, and we also must generalize the 2nd pass

Re: Debugging a Solr/Jetty Hung Process

2011-06-02 Thread Michael Sokolov
If you have an SNMP infrastructure available (nagios or similar) you should be able to set up a polling monitor that will keep statistics on the number of threads in your jvm and even allow you to inspect their stacks remotely. You can set alarms so you will be notified if cpu thread count or

Re: Index vs. Query Time Aware Filters

2011-06-02 Thread Michael Sokolov
It doesn't look like this is supported in any way that is at all straightforward. http://wiki.apache.org/solr/SolrPlugins talks about the easy ways to parameterize plugins, and they don't include what you're after. I think maybe you could extend the query parser you are currently using, wrap

Re: Newbie question: how to deal with different # of search results per page due to pagination then grouping

2011-06-02 Thread Michael Sokolov
Just keep one extra facet value hidden; ie request one more than you need to show the current page. If you get it, there are more (show the next button), otherwise there aren't. You can't page arbitrarily deep like this, but you can have a next button reliably enabled or disabled. On 6/1/201

Re: Expunging deletes from a very large index

2011-06-06 Thread Michael McCandless
You can drop your mergeFactor to 2 and then run expungeDeletes? This will make the operation take longer but (assuming you have > 3 segments in your index) should use less transient disk space. You could also make a custom merge policy, that expunges one segment at a time (even slower but even le

Re: Pattern: Is there a method of resolving multivalued date ranges into a single document?

2011-06-11 Thread Michael Sokolov
Juidoo - there's no field wildcarding in Solr as your example shows. You might want to consider building a document for each movie time that includes all the information you need to search on: times, movie name, and other details. Otherwise you need a join operation to search across related d

Re: Copying few field using copyField to non multiValued field

2011-06-15 Thread Michael Kuhlmann
In addition to Bob's response: Am 15.06.2011 13:59, schrieb Omri Cohen: [...] > stored="true" required="false" /> > stored="true" required="false" /> > stored="true" required="false" /> > stored="true" required="false" />. 1. The value for "indexed" should either be "true" or "fals

Re: High 100% CPU usage with SOLR 1.4.1

2011-06-15 Thread Michael Sokolov
Or another way of saying this is - what is the maximum throughput you get from the system (qps / indexing speed, etc) since that is what you really (should) care about - and how does it compare to the previous setup? -Mike On 6/15/2011 3:52 PM, Erick Erickson wrote: Yes, 100% CPU utilization

Re: Copying few field using copyField to non multiValued field

2011-06-16 Thread Michael Kuhlmann
Hi Omri, there are two limitations: 1. You can't sort on a multiValued field. (Anyway, on which of the copied fields would you want to sort first?) 2. You can't make the multiValued field the unique key. Both are no real limitations: 1. Better sort on at_country, at_state, at_city instead. 2. Sim

Re: Field Collapsing and Grouping in Solr 3.2

2011-06-16 Thread Michael McCandless
Alas, no, not yet.. grouping/field collapse has had a long history with Solr. There were many iterations on SOLR-236, but that impl was never committed. Instead, SOLR-1682 was committed, but committed only to trunk (never backported to 3.x despite requests). Then, a new grouping module was facto

omitTermFreqAndPositions in a TextField fieldType

2011-06-16 Thread Michael Ryan
OMIT_TF_POSITIONS; Does it even make sense to use omitTermFreqAndPositions for a TextField, or am I perhaps doing something I shouldn't be? -Michael

Re: Optimize taking two steps and extra disk space

2011-06-19 Thread Michael McCandless
With LogXMergePolicy (the default before 3.2), optimize respects mergeFactor, so it's doing 2 steps because you have 37 segments but 35 mergeFactor. With TieredMergePolicy (default on 3.2 and after), there is now a separate merge factor used for optimize (maxMergeAtOnceExplicit)... so you could eg

Re: paging and maintaingin a cursor just like ScrollableResultSet

2011-06-19 Thread Michael Sokolov
One technique I've used to page through huge result sets that could help: if you have a sortable key (like an id), you can just fetch all docs, sorted by the key, and then on subsequent page requests use the last value from the previous page as a filter in a range term like: id:[ TO *] where

Re: Optimize taking two steps and extra disk space

2011-06-20 Thread Michael McCandless
On Sun, Jun 19, 2011 at 12:35 PM, Shawn Heisey wrote: > On 6/19/2011 7:32 AM, Michael McCandless wrote: >> >> With LogXMergePolicy (the default before 3.2), optimize respects >> mergeFactor, so it's doing 2 steps because you have 37 segments but 35 >> mergeFact

Re: Optimize taking two steps and extra disk space

2011-06-20 Thread Michael McCandless
On Mon, Jun 20, 2011 at 4:00 PM, Shawn Heisey wrote: > On 6/20/2011 12:31 PM, Michael McCandless wrote: >> >> Actually, TieredMP has two different params (different from the >> previous default LogMP): >> >>   * segmentsPerTier controls how many segments you can

Re: Extending Solr Highlighter to pull information from external source

2011-06-20 Thread Michael Sokolov
I found https://issues.apache.org/jira/browse/SOLR-1397 but there is not much going on there LUCENE-1522 has a lot of fascinating discussion on this topic though There is a couple of long lived issues in jira for this (I'd like to try to se

Re: Optimize taking two steps and extra disk space

2011-06-21 Thread Michael McCandless
PM, Michael McCandless wrote: >> >> With segmentsPerTier at 35 you will easily cross 70 segs in the index... >> If you want optimize to run in a single merge, I would lower >> sementsPerTier and mergeAtOnce (maybe back to the 10 default), and set >> your maxMergeAtOnceExplic

Re: Optimize taking two steps and extra disk space

2011-06-21 Thread Michael McCandless
On Tue, Jun 21, 2011 at 9:42 AM, Shawn Heisey wrote: > On 6/20/2011 12:31 PM, Michael McCandless wrote: >> >> For back-compat, mergeFactor maps to both of these, but it's better to >> set them directly eg: >> >>     >>       10 >>       20 >&

Re: MultiValued facet behavior question

2011-06-22 Thread Michael Kuhlmann
Am 22.06.2011 05:37, schrieb Bill Bell: > It can get more complicated. Here is another example: > > q=cardiology&defType=dismax&qf=specialties > > > (Cardiology and cardiologist are stems)... > > But I don't really know which value in Cardiologist match perfectly. > > Again, I only want it to

Re: MultiValued facet behavior question

2011-06-22 Thread Michael Kuhlmann
Am 22.06.2011 09:49, schrieb Bill Bell: > You can type q=cardiology and match on cardiologist. If stemming did not > work you can just add a synonym: > > cardiology,cardiologist Okay, synonyms are the only way I can think of a realistic match. Stemming won't work on a facet field; you wouldn't g

Re: Inconsistent search results

2011-06-27 Thread Michael Kuhlmann
Am 27.06.2011 15:56, schrieb Jihed Amine Maaref: > - normalizedContents:(EDOUAR* AND une) doesn't return anything This was discussed few days ago: http://lucene.472066.n3.nabble.com/Conflict-in-wildcard-query-and-spellchecker-in-solr-search-tt3095198.html > - normalizedContents:(edouar* AND un)

Re: Include synonys in solr

2011-06-28 Thread Michael Kuhlmann
Am 28.06.2011 09:24, schrieb Romi: > But as i suppose it would be very hard to include synonyms manually for each > word as my application has large data. > > I want to know is there any way that this synonym.text file generate > automatically referring to all dictionary words I don't get the poi

Using FieldCache in SolrIndexSearcher - crazy idea?

2011-06-28 Thread Michael Ryan
lated much to this question.) We already maintain FieldCaches for the fields that we are asking for, but for other purposes. Would it make sense to utilize these FieldCaches in SolrIndexSearcher? Is this something that anyone else has done before? -Michael

Re: Fuzzy Query Param

2011-06-29 Thread Michael McCandless
Which version of Solr (Lucene) are you using? Recent versions of Lucene now accept ~N > 1 to be edit distance. Ie foobar~2 matches any term that's <= 2 edit distance away from foobar. Mike McCandless http://blog.mikemccandless.com On Tue, Jun 28, 2011 at 11:00 PM, entdeveloper wrote: > Accord

Re: Regex replacement not working!

2011-06-29 Thread Michael Kuhlmann
Am 29.06.2011 12:30, schrieb samuele.mattiuzzo: > > ... > this is the "final" version of my schema part, but what i get is this: > > > > 1.0 > Negotiable > Negotiable > Negotiable > ... The mistake is that you assume that the filter applied to the result. This is not true. Index

RE: Sorting by vale of field

2011-06-29 Thread Michael Ryan
I think this is also possible with custom function queries, but I've never done that. -Michael

Re: Fuzzy Query Param

2011-06-30 Thread Michael McCandless
Good question... I think in Lucene 4.0, the edit distance is (will be) in Unicode code points, but in past releases, it's UTF16 code units. Mike McCandless http://blog.mikemccandless.com 2011/6/30 Floyd Wu : > if this is edit distance implementation, what is the result apply to CJK > query? For

RE: Returning total matched document count with SolrJ

2011-06-30 Thread Michael Ryan
SolrDocumentList docs = queryResponse.getResults(); long totalMatches = docs.getNumFound(); -Michael

Re: TermVectors and custom queries

2011-07-01 Thread Michael Sokolov
I think that's all you can do, although there is a callback-style interface that might save some time (or space). You still need to iterate over all of the vectors, at least until you get the one you want. -Mike On 6/30/2011 4:53 PM, Jamie Johnson wrote: Perhaps a better question, is this po

Re: Match only documents which contain all query terms

2011-07-02 Thread Michael Sokolov
I believe you should be able to get results ordered so that the documents you want will always come first, so you can truncate the results efficiently on the client side. You could also try a regexp query (untested): a b c -/~(a|b|c)/ -Mike On 7/1/2011 7:50 PM, Spyros Kapnissis wrote: Hello

Re: How do I add a custom field?

2011-07-03 Thread Michael Sokolov
You'll need to index the field. I would think you would want to index/store the field along with the associated document, in which case you'll have to reindex the documents as well - there's no single-field update capability in Lucene (yet?). -Mike On 7/3/2011 1:09 PM, Gabriele Kahlout wrote

Re: Cannot I search documents added by IndexWriter after commit?

2011-07-05 Thread Michael McCandless
After your writer.commit you need to reopen your searcher to see the changes. Mike McCandless http://blog.mikemccandless.com On Tue, Jul 5, 2011 at 1:48 PM, Gabriele Kahlout wrote: >    @Test >    public void testUpdate() throws IOException, > ParserConfigurationException, SAXException, ParseEx

Re: Cannot I search documents added by IndexWriter after commit?

2011-07-05 Thread Michael McCandless
2011 at 8:09 PM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> After your writer.commit you need to reopen your searcher to see the >> changes. >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> On T

Re: Field collapsing on multiple fields and/or ranges?

2011-07-06 Thread Michael McCandless
I believe the underlying grouping module is now technically able to do this, because subclasses of the abstract first/second pass grouping collectors are free to decide what type/value the "group key" is. But, we have to fix Solr to allow for compound keys by creating the necessary concrete subcla

Re: updating existing data in index vs inserting new data in index

2011-07-07 Thread Michael Kuhlmann
Am 07.07.2011 16:14, schrieb Bob Sandiford: > [...] (Without the optimize, 'deleted' records still show up in query > results...) No, that's not true. The terms remain in the index, but the document won't show up any more. Optimize is only for performance (and disk space) optimization, as the na

Re: updating existing data in index vs inserting new data in index

2011-07-07 Thread Michael Kuhlmann
Am 07.07.2011 16:52, schrieb Mark juszczec: > Ok. That's really good to know because optimization of that kind will be > important. Optimization is only important if you had a lot of deletes or updated docs, or if you want your segments get merged. (At least that's what I know about it.) > > Wha

Re: Average PDF index time

2011-07-12 Thread Michael Kuhlmann
Am 12.07.2011 12:03, schrieb alexander sulz: > Still, why the PHP stops working correctly is beyond me, but it seems to > be fixed now. You should mind the max_execution_time parameter in you php.ini. Greetings, Kuli

Re: Result list order in case of ties

2011-07-12 Thread Michael Kuhlmann
Am 12.07.2011 12:13, schrieb Lox: > Hi, > > In the case where two or more documents are returned with the same score, is > there a way to tell Solr to sort them alphabetically? Yes, add the parameter sort=score desc,your_field_that_shall_be_sorted_alphabetically asc to your request. Greetings,

Re: Can I still search documents once updated?

2011-07-13 Thread Michael Kuhlmann
Am 13.07.2011 14:05, schrieb Gabriele Kahlout: > this is what i was expecting. Otherwise updating a field of a document that > has an unstored but indexed field is impossible (without losing the unstored > but indexed field. I call this updating a field of a document AND > deleting/updating all its

Re: Can I still search documents once updated?

2011-07-13 Thread Michael Kuhlmann
Am 13.07.2011 15:37, schrieb Gabriele Kahlout: > Well, I'm !sure how usual this scenario would be: > 1. In general those using solr with nutch don't store the content field to > avoid storing the whole web/intranet in their index, twice (1 in the form of > stored data, and one in the form of indexe

Re: Can I still search documents once updated?

2011-07-13 Thread Michael Kuhlmann
Am 13.07.2011 16:09, schrieb Gabriele Kahlout: > Solr is already configured by default not to store more than a > anyway. Usually one stores content only to display > snippets. Yes, but the snippets must come from somewhere. For instance, if you're using Solr's highlighting feature, all highligh

Re: Preserve XML hierarchy

2011-07-14 Thread Michael Sokolov
Have a look at http://wiki.apache.org/solr/DataImportHandler#XPathEntityProcessor It might be just what you need? -Mike On 7/14/2011 3:31 AM, Lucas Miguez wrote: Hi, yes, I was asking about it, is it possible to index an XML file? Is it possible to know which node of the XML the search resu

LockObtainFailedException and open finalizing IndexWriters

2011-07-18 Thread Michael Kuhlmann
Hi, we are running Solr 3.2.0 on Jetty for a web application. Since we just went online and are still in beta tests, we don't have very much load on our servers (indeed, they're currently much oversized for the current usage), and our index size on file system ist just 1.1 MB. We have one dedicat

Spatial Search with distance as a parameter

2011-07-19 Thread Michael Lorz
Hi all, I have the following problem: The documents in the index of my solr instance correspond to persons. Each document (=person) has lat/lon coordinates and additionally a travel radius. The coordinates correspond to the office of the person, the travel radius indicates a distance which the

Re: XInclude Multiple Elements

2011-07-21 Thread Michael Sokolov
The various XInclude specs were never really fully implemented by XML parsers. IMO it's really best for including whole XML files. If I remember right, the situation is that the xpointer() scheme (the most flexible) wasn't implemented. There are two other schemes for addressing content withi

Re: Logically equivalent queries but vastly different no of results?

2011-07-22 Thread Michael Kuhlmann
Am 22.07.2011 14:27, schrieb cnyee: > I think I know what it is. The second query has higher scores than the first. > > The additional condition "domain_ids:(0^1.3 OR 1)" which evaluates to true > always - pushed up the scores and allows a LOT more records to pass. This can't be, because the scor

Re: Preserve XML hierarchy

2011-07-26 Thread Michael Sokolov
Here's an idea: if you index the full text of your XML document using XmlCharFilter - available as a patch (or HtmlCharFilter), and then highlight the entire document (you will need to fiddle with highlighter parameters a bit to make sure you get 1 fragment that covers the entire file) with som

Re: SolrJ and class versions

2011-07-26 Thread Michael Sokolov
It's not clear to me (from the wiki, or the jira issue) whether the compatibility break goes both ways - maybe I should just try and see, but just to get this out there on the list: is the 3.X javabin client able to talk to 1.4 servers? If so, then there is a nicely decoupled upgrade path: get

RE: schema.xml changes, need re-indexing ?

2011-07-27 Thread Michael Ryan
You should be fine - no need to re-index your data. Adding and removing fields is generally safe to do without a re-index. Changing a field (its type, analyzers, etc) requires more caution and generally does require a re-index. -Michael

Reusing SolrServer instances when swapping cores

2011-07-28 Thread Michael Szalay
Hi all We work with two cores ("active" and "passive") and swap them when the reindexing was finished. Is it allowed to reuse the same instance of the SolrServer (both Embedded and Common)? I.E. do they point to the "other" core after the swapping? Regards Mich

Looking for a senior search engineer

2011-07-29 Thread Michael Economy
nfo on our website: http://.goodreads.com/about/us Michael Economy Director Engineering, Goodreads Inc.

Re: slow highlighting because of stemming

2011-07-30 Thread Michael Sokolov
On 7/30/2011 3:46 AM, Orosz György wrote: Hi, Thanks for the answer! I am doing some logging about stemming, and what I can see is that a lot of tokens are stemmed for the highlighting. It is the strange part, since I don't understand why does any highlighter need stemming again. Consider that t

Re: Solr request filter and indexing process

2011-07-31 Thread Michael Sokolov
The first thing that comes to mind is to check whether you are committing after every insert. A number of things may happen when you commit, including merges, rebuilding the spelling dictionary (is this still true in 3.3? maybe not). It's better to commit after a batch of inserts. -Mike O

Re: Solr can not index "F**K"!

2011-07-31 Thread Michael Sokolov
On 7/31/2011 7:29 PM, randohi wrote: org.apache.solr.analysis.KeywordMarkerFilterFactory args:{protected: protwords.txt luceneMatchVersion: LUCENE_33 } Could something be going on here? What's in your "protwords.txt" ? -Mike

Re: Store complete XML record (DIH & XPathEntityProcessor)

2011-08-01 Thread Michael Sokolov
On 8/1/2011 6:17 AM, Chantal Ackermann wrote: If you are looking for a config-only solution - i'm not sure that there is one. Someone else might be able to comment on that? You might want to take a look at SOLR-2597; it has a patch for XmlStripCharFilter, which will strip tags from XML for inde

Re: segment.gen file is not replicated

2011-08-04 Thread Michael McCandless
This file is actually optional; its there for redundancy in case the filesystem is not "reliable" when listing a directory. Ie, normally, we list the directory to find the latest segments_N file; but if this is wrong (eg the file system might have stale a cache) then we fallback to reading the seg

Re: segment.gen file is not replicated

2011-08-04 Thread Michael McCandless
I think we should fix replication to copy it? Mike McCandless http://blog.mikemccandless.com On Thu, Aug 4, 2011 at 8:16 AM, Bernd Fehling wrote: > > > Am 04.08.2011 12:52, schrieb Michael McCandless: >> >> This file is actually optional; its there for redundancy in case

"Weighted" facet strings

2011-08-05 Thread Michael Lorz
Hi all, I have documents which are (manually) tagged whith categories. Each category-document relation has a weight between 1 and 5: 5: document fits perfectly in this category, . . 1: document may be considered as belonging to this category. I would now like to use this information with so

RE: How come this query string starts with wildcard?

2011-08-10 Thread Michael Ryan
I think this is because ")" is treated as a token delimiter. So "(foo)bar" is treated the same as "(foo) bar" (that is, bar is treated as a separate word). So "(foo)*" is really parsed as "(foo) *" and thus the * is treated as the start of a new word. -Michael

RE: copyfields in schema.xml

2011-08-11 Thread Michael Ryan
Nope. The 'text' field will just have the 'titulo' contents. To have both, you would have to do something like this: -Michael

Re: Some questions about SolrJ

2011-08-13 Thread Michael Sokolov
On 8/12/2011 4:18 PM, Shawn Heisey wrote: On 8/12/2011 1:49 PM, Shawn Heisey wrote: I am sure that I have more questions, but I may be able to answer a lot of them myself if I can see better examples. Thought of another question. My Perl build system uses DIH for all indexing, but with the J

Re: Some questions about SolrJ

2011-08-13 Thread Michael Sokolov
Shawn, my experience with SolrJ in that configuration (no autoCommit) is that you have control over commits: if you don't issue an explicit commit, it won't happen. Re lifecycle: we don't use a static instance; rather our app maintains a small pool of CommonsHttpSolrServer instances that we

Re: SOLR 3.3.0 multivalued field sort problem

2011-08-13 Thread Michael Lackhoff
erts: some solution is better than no solution. -Michael

Re: SOLR 3.3.0 multivalued field sort problem

2011-08-13 Thread Michael Lackhoff
e to have at least something. Any possible customization would be an extra bonus. -Michael

Re: SOLR 3.3.0 multivalued field sort problem

2011-08-13 Thread Michael Lackhoff
lues) As long as sorting is only allowed on single value fields, both are identical. As soon as you allow multivalued fields to be sorted on, both interpretations mean something different and I think both have their valid use case. But I don't want to stress this too far. -Michael

RE: Solr Accent Insensitive and sensitive search

2011-08-17 Thread Michael Ryan
Are you using the same analyzer for both type="query" and type="index"? Can you show us the fieldType from your schema? -Michael

Re: Solr Join in 3.3.x

2011-08-18 Thread Michael McCandless
Unfortunately Solr's join impl hasn't been backported to 3.x, as far as I know. You might want to look at ElasticSearch; it has a join implementation already or use Solr 4.0. Mike McCandless http://blog.mikemccandless.com On Wed, Aug 17, 2011 at 7:40 PM, Cameron Hurst wrote: > Hello all, >

Requiring multiple matches of a term

2011-08-19 Thread Michael Ryan
og"~20 returns 9291 results "dog dog dog dog"~30 returns 6395 results Anyone ever do something like this and know how I can accomplish this? -Michael

heads up: re-index trunk Lucene/Solr indices

2011-08-20 Thread Michael McCandless
Hi, I just committed a new block tree terms dictionary implementation, which requires fully re-indexing any trunk indices. See here for details: https://issues.apache.org/jira/browse/LUCENE-3030 If you are using a released version of Lucene/Solr then you can ignore this message. Mike McCan

RE: Requiring multiple matches of a term

2011-08-21 Thread Michael Ryan
t may or may not match. I think the HashSet.toArray() call is to blame here, but I don't yet fully understand the expected behavior of the initPhrasePositions function... -Michael

How to copy and extract information from a multi-line text before the tokenizer

2011-08-23 Thread Michael Kliewe
hint, or a totally different method to achieve my goal to extract a single line from this multi-line-text. Kind regards and thanks for any help Michael

Copying cores with solrj?

2011-08-24 Thread Michael Szalay
quot; so that I can have the current state to start with. I'm missing the "COPY"-Core admin request. How can I copy the index of the first core to the second one in a efficient manner? Regards Michael -- Michael Szalay Senior Software Engineer basis06 AG, Birkenweg 61, CH-3013 B

Optimize requires 50% more disk space when there are exactly 20 segments

2011-08-24 Thread Michael Ryan
first when merging the 20 segments down to 2, then again when merging from 2 to 1. I would like to avoid this if at all possible, as it requires 50% more disk space and takes almost twice as long to optimize. Would using TieredMergePolicy help me here, or some other config I can change? -Michael

RE: Query vs Filter Query Usage

2011-08-25 Thread Michael Ryan
will usually be represented by a BitDocSet, which requires 1 bit per doc in your index (result set size doesn't matter), so in your case it would be about 1.2MB. -Michael

Wildcard queries on whole words

2012-06-27 Thread Klostermeyer, Michael
I am researching an issue w/ wildcard searches on complete words in 3.5. For example, searching for "kloster*" returns "klostermeyer", but "klostermeyer*" returns nothing. The field being queried has the following analysis chain (standard 'text_general'):

RE: Wildcard queries on whole words

2012-06-27 Thread Klostermeyer, Michael
Interesting solution. Can you then explain to me for a given query: ?q='kloster' OR kloster* How the "exact match" part of that is boosted (assuming the above is how you formulated your query)? Thanks! Mike -Original Message- From: Michael Della Bitta [mai

RE: Strange "spikes" in query response times...any ideas where else to look?

2012-06-28 Thread Michael Ryan
logLatency in jetty.xml) 3) The "QTime" as returned in the Solr response 3) Are you running multiple queries concurrently, or are you just using a single thread in JMeter? -Michael -Original Message- From: s...@isshomefront.com [mailto:s...@isshomefront.com] Sent: Thursday, June

RE: NGram and full word

2012-06-29 Thread Klostermeyer, Michael
With the help of this list, I solved a similar issue by altering my query as follows: Before (did not return full word matches): q=searchTerm* After (returned full-word matches and wildcard searches as you would expect): q=searchTerm OR searchTerm* You can also boost the exact match by doing th

leap second bug

2012-07-01 Thread Michael Tsadikov
Our solr servers went into GC hell, and became non-responsive on date change today. Restarting tomcats did not help. Rebooting the machine did. http://www.wired.com/wiredenterprise/2012/07/leap-second-bug-wreaks-havoc-with-java-linux/

Re: leap second bug

2012-07-01 Thread Michael McCandless
http://blog.mikemccandless.com On Sun, Jul 1, 2012 at 8:08 AM, Óscar Marín Miró wrote: > Hello Michael, thanks for the note :) > > I'm having a similar problem since yesterday, tomcats are wild on CPU [near > 100%]. Did your solr servers did not reply to index/query requests? > > Thanks :

DIH - unable to ADD individual new documents

2012-07-02 Thread Klostermeyer, Michael
I am not able to ADD individual documents via the DIH, but updating works as expected. The stored procedure that is called within the DIH returns the expected data for the new document, Solr appears to "do its thing", but it never makes it to the Solr server, as evidence that subsequent querie

RE: DIH - unable to ADD individual new documents

2012-07-02 Thread Klostermeyer, Michael
I should add that I am using the full-import command in all cases, and setting clean=false for the individual adds. Mike -Original Message- From: Klostermeyer, Michael [mailto:mklosterme...@riskexchange.com] Sent: Monday, July 02, 2012 5:41 PM To: solr-user@lucene.apache.org Subject

RE: DIH - unable to ADD individual new documents

2012-07-02 Thread Klostermeyer, Michael
ta when I run the SP directly. Mike -Original Message- From: Klostermeyer, Michael [mailto:mklosterme...@riskexchange.com] Sent: Monday, July 02, 2012 8:24 PM To: solr-user@lucene.apache.org Subject: RE: DIH - unable to ADD individual new documents I should add that I am using the full-i

Re: Near Real Time Indexing and Searching with solr 3.6

2012-07-03 Thread Michael McCandless
Hi, You might want to take a look at Solr's trunk (very soon to be 4.0.0 alpha release), which already has a near-real-time solution (using Lucene's near-real-time APIs). Lucene has NRTCachingDirectory (to use RAM for small / recently flushed segments), but I don't think Solr uses it yet. Mike M

RE: DIH - unable to ADD individual new documents

2012-07-03 Thread Klostermeyer, Michael
nts On 3 July 2012 07:54, Klostermeyer, Michael wrote: > I should add that I am using the full-import command in all cases, and > setting clean=false for the individual adds. What does the data-import page report at the end of the full-import, i.e., how many documents were indexed? Are

RE: DIH - unable to ADD individual new documents

2012-07-03 Thread Klostermeyer, Michael
eficial for performance all around. Of course if you're trying to do this with the near-real-time functionality batching isn't your answer. But DIH isn't designed at all to work well with NRT either... James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message-

RE: DIH - unable to ADD individual new documents

2012-07-03 Thread Klostermeyer, Michael
the Tika stuff pretty easily.. Best Erick On Tue, Jul 3, 2012 at 3:35 PM, Klostermeyer, Michael wrote: > Well that little bit of knowledge changes things for me, doesn't it? I > appreciate your response very much. Without knowing that about the DIH, I > attempted to h

Get all matching terms of an OR query

2012-07-04 Thread Michael Jakl
x27;d like to prevent storing the texts because of space issues, but if that's the only reasonable solution... . Thank you, Michael

Re: leap second bug

2012-07-04 Thread Michael Tsadikov
ate; > /etc/init.d/ntp start > > And tomcat magically switched from 100% CPU to 0.5% :) > > From: > > > https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/_I1_OfaL7QY > > [from Michael McCandless help on this thread] > > On Sun, Jul 1,

Re: Get all matching terms of an OR query

2012-07-04 Thread Michael Jakl
ts and research, using the debugQuery method seems the only viable solution(?) Cheers, Michael

<    5   6   7   8   9   10   11   12   13   14   >