Re: return matched terms / fuzzy or wildcard searches

2007-03-23 Thread Mike Klaas
g rewrite() before using the Highlighter? It is, in trunk/: NamedList sumData = HighlightingUtils.doHighlighting( results.docList, query.rewrite(req.getSearcher().getReader()), req, new String[]{defaultFiel d}); Definitely a bug somewhere. Does anyone more familiar with lucene see why the above wouldn't be sufficient? -Mike

Re: return matched terms / fuzzy or wildcard searches

2007-03-23 Thread Mike Klaas
On 3/23/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 3/23/07, Mike Klaas <[EMAIL PROTECTED]> wrote: > Definitely a bug somewhere. Does anyone more familiar with lucene see > why the above wouldn't be sufficient? Perhaps our use of ConstantScorePrefixQuery by defaul

Re: return matched terms / fuzzy or wildcard searches

2007-03-24 Thread Mike Klaas
you want highlighting -- so instead of dn* search for dn?* Note that you need the a recent nightly build for that to work--it wasn't there for the last release. -Mike

Re: How to wildcard search with colons?

2007-03-26 Thread Mike Klaas
removes the : and everything after it q=trackURL:http%3A//host* <-- doesn't work, same as above q=trackURL:http*host* <-- TooManyClauses exception, not what I want anyway Have you tried: trackURL:http\://host* -Mike

Re: which one will save hard disk space?

2007-03-26 Thread Mike Klaas
e completely disjoint: indexing is a lossy operation, so if you want to be able retrieve the original contents, they must be stored separately (ie., the first option uses the least space). -MIke

Re: which one will save hard disk space?

2007-03-26 Thread Mike Klaas
loss. So you don't need to store it separately. what do you think? In theory that might be true, but lucene is not implemented that way, I'm afraid. If this is the a priori situation, it is probably easier to implement this outside of lucene and "store" the id in your external index. -Mike

Re: How to make the search default use AND instead of OR?

2007-03-27 Thread Mike Klaas
ly be achieved by using a high percentage setting, but I'd have to double check how the rounding is done). -Mike

Re: maximum index size

2007-03-27 Thread Mike Klaas
a document for each customer then some field must indicate to which customer the document instance belongs. In that case, why not index a single copy of each document, with a field containing a list of customers having access? -Mike

Re: storing results

2007-03-27 Thread Mike Klaas
you clarify what you're looking for Solr to do for you? -Mike

Re: maximum index size

2007-03-27 Thread Mike Klaas
docs. I'll don't have much lucene-sort fu, though, so an expert should chime in... -Mike

Re: Document boost not as expected...

2007-03-27 Thread Mike Klaas
s incorporated into the fieldNorm and so is modified by the lengthNorm. Further, during query the term idf, queryNorm come into play. You shouldn't expect that the document boost will be returned as the document score (although you should expect it to affect it). -Mike

Re: How to make the search default use AND instead of OR?

2007-03-27 Thread Mike Klaas
he/solr/util/doc-files/min-should-match.html Indeed it is, though I wasn't aware of the detailed documentation. Not that it that hard to find, but it is three links away from the main dismax page on the wiki. I might add a link directly from the main dismax javadoc to help people like me find it, -Mike

Re: Solr finding doc by one field but not by another

2007-03-28 Thread Mike Klaas
aps via StandardAnalyzer) occurred in the original index. -Mike

Re: Document boost not as expected...

2007-03-28 Thread Mike Klaas
field. -Mike

Re: Document boost not as expected...

2007-03-28 Thread Mike Klaas
On 3/28/07, escher2k <[EMAIL PROTECTED]> wrote: Mike, I am not doing anything custom for this test. I am assuming that the Default Similarity is used. Surprisingly, if I remove the document level boost (set to 1.0) and just have a field level boost, the result seems to be correct. A

Re: maximum index size

2007-03-29 Thread Mike Klaas
1.com/msg/91276.html, which doesn't exactly boast completion of such a feat! -Mike On 3/28/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: Hi Mike, I'm curious about what you said there: "People have constructed (lucene) indices with over a billion documents.". Are y

Re: Changing encoding norms and boosting...

2007-03-29 Thread Mike Klaas
also store doc boosts that span a wider dynamic variance (1 to 15 rather than 1 to 1.5, say), then compensate by applying a query-time boost of 0.1. -Mike

Re: Question: how to config memory with SolrPerformanceFactor

2007-03-29 Thread Mike Klaas
On 3/29/07, James liu <[EMAIL PROTECTED]> wrote: i find solr alway can work when i delete index,,i think it maybe cached in memory. Tricky, eh? Unix doesn't _truly_ delete the files until all processes have closed them or terminated. -Mike

Re: Troubleshooting java heap out-of-memory

2007-04-02 Thread Mike Klaas
re memory after their datastructures have been built, so it would be odd to see OOM after 48 hours if they were the cause. -Mike

Re: Does solr support Multi index and return by score and datetime

2007-04-04 Thread Mike Klaas
ately equal values anyway. -Mike

Re: Access filterCache/queryResultCache/documentCache

2007-04-04 Thread Mike Klaas
big enough for faceting. i could use the same thing! It would also be useful to (for instance) insert a filter into the filter cache that could be subsequently used by a query. Obviously, this is really only useful for filters that aren't constructed from a query -> docset. -Mike

Re: Does solr support Multi index and return by score and datetime

2007-04-04 Thread Mike Klaas
On 4/4/07, James liu <[EMAIL PROTECTED]> wrote: 2007/4/5, Mike Klaas <[EMAIL PROTECTED]>: > > On 4/4/07, James liu <[EMAIL PROTECTED]> wrote: > > That means now i can' solve it with solr? > > Not out-of-the-box, no. But you can certainly query your sl

Re: Does solr support Multi index and return by score and datetime

2007-04-04 Thread Mike Klaas
3. put all returned documents into an array, and reverse sort by score 4. select documents [N, N+M) from this array. This is a relatively simple task. It gets more complicated once multiple passes, idf compensation, deduplication, etc. are added. -Mike

RE: C# API for Solr

2007-04-05 Thread Mike Austin
I would be very interested in this. Any idea on when this will be available? Thanks -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Monday, April 02, 2007 1:44 AM To: solr-user@lucene.apache.org Subject: Re: C# API for Solr Well, i think there will be a lot of

Re: Solr logo poll

2007-04-06 Thread Mike Klaas
On 4/6/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: A) http://issues.apache.org/jira/secure/attachment/12349897/logo-solr-d.jpg

Re: Python utilities for solr?

2007-04-15 Thread Mike Klaas
lable) from Python objects. Then again, JSON for posting would be really nice to have :) It is not documented very well, but you can pass in a multi-map to the solr.py client: .add(field_one=['one', 'two', 'three'], field_two='value', ...) -Mike

limiting the rows returned for a query

2007-04-18 Thread mike topper
d whatnot which makes it also a little bit harder to do this way. anyways hope that makes sense, let me know! -Mike

Re: Facet Browsing

2007-04-19 Thread Mike Klaas
k to a page that includes a link to the nightly build and CHANGES.txt, or the release package for already-released versions. -Mike

Re: help need on words with special characters

2007-04-19 Thread Mike Klaas
your existing analyzer to recognize it (WordDelimiterFilter if you are using the standard text field in the Solr example). If it is complicated, you should look into creating your own analyzer. -Mike

Re: Filter question...

2007-04-19 Thread Mike Klaas
#x27; with 1000 words of 'delhi', highest score to matches having the words nearby -Mike

Re: Avoiding caching of special filter queries

2007-04-20 Thread Mike Klaas
pulate it during the main query, and grab ids from the cache during the highlighting step. -Mike

Re: Solr performance warnings

2007-04-20 Thread Mike Klaas
ecause the new posts must be available in the search as soon as they are posted. Do you think there is a way to optimize this? "As soon as" is a rather vague requirement. If you can specify the minimum acceptible delay, then you can use Solr's autocommit functionality to trigger commits. -Mike

Re: sorting by matched field, then title alpha

2007-04-20 Thread Mike Klaas
roximate it by doing something like: A:"phrase"^10 B:"phrase"^1 C:"phrase"^1000 D:"phrase"^100 E:"phrase"^30 HTH, -Mike

Re: Solr performance warnings

2007-04-20 Thread Mike Klaas
a few weeks after 1.1 was cut. I suggest using a nightly build from Feb 2 or later, or waiting until 1.2 is released. cheers, -Mike

Re: Facet Browsing

2007-04-22 Thread Mike Klaas
e. Sounds good. If it is sufficiently unobstrusive, it probably isn't even necessary to change it later. -Mike

Re: solr utf 16 ?

2007-04-23 Thread Mike Klaas
I couldn't give you a timeline. For the time being, consider that 1. utf-8 is the "lingua franca" of xml document encoding 2. it is very easy to convert it yourself (it would be a 3-4 line python commandline filter, frinstance). -Mike

Re: Re[2]: Things are not quite stable...

2007-04-25 Thread Mike Klaas
nt of or limitation to a particular package. -Mike

Re: Re[4]: Things are not quite stable...

2007-04-25 Thread Mike Klaas
the default package with jetty can be used for production. Do you know that Jetty is the culprit? We've been successfully using it for production purposes. -Mike

Re: Re[6]: Things are not quite stable...

2007-04-25 Thread Mike Klaas
and since the provided container has never given me any (significant[1]) issues, I've kept with it. [1] Aside from XML-escaping irregularities that were discussed on the list last year. -Mike

Solr index updating pattern

2007-04-25 Thread Mike Austin
off of windows server so I haven't even looked into the snappuller etc.. stuff. Thanks, Mike

Re: Additive scoring using Dismax...

2007-04-26 Thread Mike Klaas
andler. -Mike

Re: Solr index updating pattern

2007-04-26 Thread Mike Klaas
On 4/25/07, Mike Austin <[EMAIL PROTECTED]> wrote: Could someone give advise on a better way to do this? I have an index of many merchants and each day I delete merchant products and re-update my database. After doing this I than re-create the entire index and move it to production rep

RE: Solrsharp feedback

2007-04-26 Thread Mike Austin
he work. I might actually be able to contribute some code to this at some point... maybe in conjunction with my solr servlet code and how I do faceting and category navigation. Thanks, Mike -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Thursday, April 26, 2007

Re: NullPointerException (not schema related)

2007-05-01 Thread Mike Klaas
example, if I called Runtime.exec with a command of "test_program" (which is a bash script), it failed. If I called Runtime.exec with a command of "/bin/bash test_program" it worked. Yes, Runtime.exec does not invoke a shell automatically, so shebang lines, shell built-ins, io redirection, etc. cannot be used directly. -Mike

Re: Snippet Generation at Punctuation Marks?

2007-05-03 Thread Mike Klaas
27;re hoping to fix that asap. See http://issues.apache.org/jira/browse/SOLR-102 for my solution to this problem. The idea is that you'd like to split at sentence boundaries, but also not stray too far from the desired fragment size. It would be great to get comments on/improvements to this approach. -Mike

Re: facet.sort does not work in python output

2007-05-03 Thread Mike Klaas
. There is some past discussion on the list if you search the archives. -Mike

Re: solr.py - set boosts?

2007-05-03 Thread Mike Klaas
browse/SOLR-216 -Mike

Re: Look ahead queries

2007-05-03 Thread Mike Klaas
actly what you're planning on doing). Typically, the feature you are talking about is implemented by analyzing query logs, which are a much more relevant corpus than the raw documents in this context. I suggest focusing your efforts in that direction (possibly checking to see if someone has doing this with lucene already...) cheers, -Mike

Re: Re[2]: facet.sort does not work in python output

2007-05-04 Thread Mike Klaas
sorted(facet.field_values.items(), key=lambda x: x[1], reverse=True) or even from operator import itemgetter sorted(facet.field_values.items(), key=itemgetter(1), reverse=True) digressionally, -Mike

Re: something i think about "facet"

2007-05-07 Thread Mike Klaas
You could easily store all 100 facets, display the first ten and fill in the rest with some (hidden) javascript when the user clicks a button (or re-request the facets from Solr with a higher threshold). -Mike

adjusting score slightly by date field

2007-05-09 Thread mike topper
records in the order of results while still maintaining the scoring? -Mike

Re: Facet only support english?

2007-05-09 Thread Mike Klaas
TF-8 by default. Any objections? No--I'm not sure that it'll bring clarity for anyone who isn't aware of xml encoding issues, but I can't see it hurting. -Mike

Re: Facet only support english?

2007-05-09 Thread Mike Klaas
On 5/9/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: > +1 on explicit encoding declarations. Done (even though it really wasn't needed since there were no int'l chars in the example). As Mike points out, it only marginally helps... if the user adds international chars to the

Re: dates & times

2007-05-10 Thread Mike Klaas
be difficult to create a patch if you were interested, but I'm curious: What about XSL makes what seems to me an elementary string-processing task so difficult? regards -Mike

Re: delete for multiple documents at once

2007-05-11 Thread Mike Klaas
sing delete by query: docId:XXX OR docID:YYY OR docId:ZZZ ... -Mike

Re: Solr concurrent commit not updated

2007-05-11 Thread Mike Klaas
readedly if you want some concurrency). regards, -Mike

Re: Question: Pagination with multi index box

2007-05-14 Thread Mike Klaas
tition "runs out" of docs before it is done, request a new round. -Mike

Re: Question: Pagination with multi index box

2007-05-14 Thread Mike Klaas
On 14-May-07, at 6:49 PM, James liu wrote: 2007/5/15, Mike Klaas <[EMAIL PROTECTED]>: On 14-May-07, at 1:35 AM, James liu wrote: When you get up to 60 partitions, you should make it a multi stage process. Assuming your partitions are disjoint and evenly distributed, estimate the num

Re: Question: Pagination with multi index box

2007-05-14 Thread Mike Klaas
e the docs from 0 to N for each partition (whether through one request or multiple). -Mike 2007/5/15, James liu <[EMAIL PROTECTED]>: 2007/5/15, Mike Klaas <[EMAIL PROTECTED]>: > > On 14-May-07, at 1:35 AM, James liu wrote: > > > if use multi index box,

Re: Question: Pagination with multi index box

2007-05-14 Thread Mike Klaas
order. You have to perform that sort manually. so it will not sorted by score correctly. and if user click page 2 to see, how to show data? p1 start from 10 or query other partitions? Assemble results 1 through 20, then display 11-20 to the user. -Mike 2007/5/15, Mike Klaas <[EMAIL P

Re: Question: Pagination with multi index box

2007-05-15 Thread Mike Klaas
On 14-May-07, at 10:05 PM, James liu wrote: 2007/5/15, Mike Klaas <[EMAIL PROTECTED]>: I'm not ignoring it: I'm implying that the above is the correct descending score-sorted order. You have to perform that sort manually. i mean merged results(from 60 p) and sort it,

Re: Requests per second/minute monitor?

2007-05-15 Thread Mike Klaas
hat timing/statistics might be handleable on a larger scale. OTOH, it does give an easy way to requesthandlers to insert detailed timing data in a logical place in the output. -Mike

PriceJunkie.com using solr!

2007-05-16 Thread Mike Austin
categories - simple xml configuration for the final outputted category configuration file I'm sure there are more cool things but that is all for now. Join the mailing list to see more improvements in the future. Also.. how do I get added to the Using Solr wiki page? Thanks, Mike Austin

how to use function queries

2007-05-21 Thread mike topper
need to use a different query handler? -Mike

Re: How to handle hl.fl form variable (any variable with a dot in its name) from javascript?

2007-05-22 Thread Mike Klaas
e in programming languages. Note that the URL parameter is not a variable in Solr. Your problems seem to be occurring due to the use of systems that attempt to map data to variable names, which seems to me like a worse idea than using '.' in url paramters. regards, -Mike

Re: List of highlighted terms from search query

2007-05-23 Thread Mike Klaas
x27;t appearing. Could you clarify what you mean? What analyzers are you using? -Mike Thanks -Amit James liu wrote: first u try enable highlighting( http://wiki.apache.org/solr/HighlightingParameters) and u try solr admin gui to see its output and u will find what u wanna. 2007/5/23, s

RE: PriceJunkie.com using solr!

2007-05-23 Thread Mike Austin
search and noticed pages were executed through aspx. Are you using .net to parse the xml results from SOLR? Nice site, just trying to figure out where SOLR fits into this. On 5/16/07, Mike Austin <[EMAIL PROTECTED]> wrote: I just wanted to say thanks to everyone for the creation o

RE: PriceJunkie.com using solr!

2007-05-23 Thread Mike Austin
ery nice job! > It's fast too. > > -Yonik > > On 5/16/07, Mike Austin <[EMAIL PROTECTED]> wrote: > > I just wanted to say thanks to everyone for the creation of solr. I've > been > > using it for a while now and I have recently brought one of my side > pro

Re: AW: Re[2]: add and delete docs at same time

2007-05-25 Thread Mike Klaas
any non- trivial size index. -Mike

Re: facet should add facet.analyzer

2007-05-28 Thread Mike Klaas
ess to the unindexed version? My suggestion would be to copyField into an unanalyzed version, and facet on that. cheers, -Mike

Re: Schema question: overriding fieldType attributes in field element

2007-05-31 Thread Mike Klaas
and others not -- while still leaving all other options open. Define two fieldTypes, and use one for "tokenized" analysis and another for "untokenized"? -Mike

Re: facet question

2007-05-31 Thread Mike Klaas
). Do you really have 1.5M unique values in that field. Are you analyzing the field (you probably shouldn't be)? -Mike

Re: facet question

2007-05-31 Thread Mike Klaas
suspicious about your application. You have 1.5M distinct tags for 4M documents? That seems quite dense. -Mike

Re: question about highlight field

2007-06-01 Thread Mike Klaas
this exactitude to carry forth in your highlighting, specify hl.requireFieldMatch=true. -Mike

Re: Indexing a lot of documents?

2007-06-01 Thread Mike Klaas
pen file limit (I upped mine from 1024 to 45000 to handle huge indices). You can alleviate this by reducing the mergeFactor, but this can impact indexing performance. And: is there a way to just hand the XML file to Solr without having to POST it? No, but POST'ing shouldn't be a bottleneck. -Mike

Re: Indexing a lot of documents?

2007-06-02 Thread Mike Klaas
log_{base mergefactor}(numDocs) * mergeFactor segments, approximately. -Mike

Re: question about highlight field

2007-06-04 Thread Mike Klaas
7;t even searching them. One option is to search those fields directly, using dismax. In that case, the highlight fields will be picked up automatically. -Mike

Re: how to crawl when Solr is search engine?

2007-06-07 Thread Mike Klaas
ted at all (check disk usage stats). How is Solr caching better than this? It is unrelated. Solr can cache certain reusable components of queries (namely, filters), and provides for fully-customizable schema and arbitrary query execution on it. -Mike

Re: solr+hadoop = next solr

2007-06-07 Thread Mike Klaas
on. Solr is an open-source project, so huge features will get implemented when there is a person or group of people devoted to leading the charge on the issue. If you're interested in being that person, that's great! -Mike

Re: TextField case sensitivity

2007-06-07 Thread Mike Klaas
http://issues.apache.org/jira/browse/SOLR-257 I'll probably commit it in a day or so, at which point it will be part of the Solr nightly build. -Mike

Re: DisMax request handler doesn't work with stopwords?

2007-06-07 Thread Mike Klaas
hen using a high mm (minimum #clauses match) setting with dismax, as it effectively requires 'in' to be in the url column, which was probably not the intent of the query. -Mike

Re: highlight and wildcards ?

2007-06-07 Thread Mike Klaas
efix query. -Mike

Re: Boosting in version 1.2

2007-06-08 Thread Mike Klaas
2 to specify more than one default search field, or is the above solution still the way to go? This is precisely the situation that the dismax handler was designed for. Plus, you don't have to fiddle around with document boosts. try: qt=dismax q=letters qf=keywords^3.0 title^2.0 content -Mike

Re: Cannot index '&' this character using post.jar

2007-06-08 Thread Mike Klaas
On 8-Jun-07, at 10:19 AM, Tiong Jeffrey wrote: Hi all, I tried to index a document that has '&' using post.jar. But during the indexing it causes error and it wont finish the indexing. Can I know why is this and how to prevent this? Thanks! XML requires &'s to be escaped. & -> & -Mike

Re: To make sure XML is UTF-8

2007-06-08 Thread Mike Klaas
-connection encoding. I think the default is 'latin-1'; try googling 'mysql collation'. You could use python to convert the file: open('outfile', 'wb').write(open('infile', 'rb').read().decode ('latin-1').encode('utf-8')) regards, -Mike

Re: How does HTMLStripWhitespaceTokenizerFactory work?

2007-06-11 Thread Mike Klaas
creates an inverted index; the storage system keeps track of the data you give it _before_ analysis/ tokenization. If there is analysis you'd like to do that also applies to the stored status of the doc, it's probably easier to apply it before passing the data to Solr. -MIke On 08

Re: fq with standard request handler

2007-06-11 Thread Mike Klaas
a month ago. I don't recall seeing any bugs with the 'fq' param. er... since the second batch of queries returned no hits, does that not indicate that the problem _isn't_ with fq? You practically stripped it down to raw lucene territory here. -MIke

Re: facet query counts

2007-06-14 Thread Mike Klaas
es to be storing in a binary float--you're probably comparing mostly the exponent, which is not necessarily disjoint. Have you tried sdouble? And this problem seems to occur in most (if not all) of my range queries. Is there anything that I am doing wrong here? Is this true on other field types as well? -Mike

Re: problems getting data into solr index

2007-06-14 Thread Mike Klaas
is handling it? I suspect it is encoded somehow, which could be problematic. Is it going through a web browser? How is it getting into mysql? -MIke

Re: Copying part of index directory

2007-06-16 Thread Mike Klaas
) ? No, the index dir is determined by solrconfig.xml of the Solr instance. The python client can only be used to connect to an already-running instance. -Mike

Re: problems getting data into solr index

2007-06-16 Thread Mike Klaas
ascii', 'ignore')) # assuming s is a bytestring u.encode('ascii', 'ignore') # assuming u is a unicode string -Mike On 15-Jun-07, at 2:45 AM, vanderkerkoff wrote: Hi Mike The characters that are giving us problems are the old favourites of apostrophe's an

Re: Copying part of index directory

2007-06-18 Thread Mike Klaas
entirely by your requirements. Since you wanted to create a new subindex, you'll have to set up another Solr instance somewhere. Another machine, another webapp, etc. -Mike

Re: problems getting data into solr index

2007-06-18 Thread Mike Klaas
On 18-Jun-07, at 6:27 AM, vanderkerkoff wrote: Cheesr Mike, read the page, it's starting to get into my brian now. Django was giving me unicode string, so I did some encoding and decoding and now the data is getting into solr, and it's simply not passing the characters that a

Re:

2007-06-19 Thread Mike Klaas
you find that there is a performance problem. -Mike

Re: add CJKTokenizer to solr

2007-06-19 Thread Mike Klaas
I get this error, I searched the email archive, it seems working for other users. Does anyone know what is the problem? CJKTokenizerFactory that I am using is appended. Would you be interested in contributing this class to solr? -MIke

Re: problems getting data into solr index

2007-06-20 Thread Mike Klaas
On 20-Jun-07, at 6:38 AM, vanderkerkoff wrote: Hello Mike, Brian My brain is approcahing saturation point and I'm reading these two opinoins as opposing each other. I'm sure I'm reading it incorrectly, but they seem to contradict each other. Are they? solr.py ta

RE: Faceted Search!

2007-06-20 Thread Mike Austin
Niraj: What environment are you using? SQL Server/.NET/Windows? or something else? -Mike -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 20, 2007 4:24 PM To: solr-user@lucene.apache.org Subject: Re: Faceted Search! : define the sub-categories

Re: add CJKTokenizer to solr

2007-06-22 Thread Mike Klaas
On 21-Jun-07, at 10:22 PM, Chris Hostetter wrote: like i said though: i'm in favore of factories like this ... i just don't think we should do anything to hide their use and make refering to Tokenizer or TOkenFilter class names directly use reflection magicly. What would be the best way to

Re: Highlighting in large text fields

2007-06-25 Thread Mike Klaas
made configurable. I'll add it to the future features. -Mike

Re: Solr - autocommit params

2007-06-25 Thread Mike Klaas
re pending documents, is that correct? That is correct. -Mike

<    5   6   7   8   9   10   11   >