Re: dismax request handler without q

2010-07-20 Thread Joe Calderon
try something like this: q.alt=*:*&fq=keyphrase:hotel though if you dont need to query across multiple fields, dismax is probably not the best choice On Tue, Jul 20, 2010 at 4:57 AM, olivier sallou wrote: > q will search in defaultSearchField if no field name is set, but you can > specify in you

Re: questions about Solr shards

2010-06-28 Thread Joe Calderon
there is a first pass query to retrieve all matching document ids from every shard along with relevant sorting information, the document ids are then sorted and limited to the amount needed, then a second query is sent for the rest of the documents metadata. On Sun, Jun 27, 2010 at 7:32 PM, Babak

Re: Strange query behavior

2010-06-28 Thread Joe Calderon
splitOnCaseChange is creating multiple tokens from 3dsMax disable it or enable catenateAll, use the analysys page in the admin tool to see exactly how your text will be indexed by analyzers without having to reindex your documents, once you have it right you can do a full reindex. On Mon, Jun 28,

Re: preside != president

2010-06-28 Thread Joe Calderon
the general consensus among people who run into the problem you have is to use a plurals only stemmer, a synonyms file or a combination of both (for irregular nouns etc) if you search the archives you can find info on a plurals stemmer On Mon, Jun 28, 2010 at 6:49 AM, wrote: > Thanks for the ti

Re: SOLR partial string matching question

2010-06-22 Thread Joe Calderon
you want a combination of WhitespaceTokenizer and EdgeNGramFilter http://lucene.apache.org/solr/api/org/apache/solr/analysis/WhitespaceTokenizerFactory.html http://lucene.apache.org/solr/api/org/apache/solr/analysis/EdgeNGramFilterFactory.html the first will create tokens for each word the second

Re: Comma delemitered words shawn in terms like one word.

2010-06-18 Thread Joe Calderon
set generateWordParts=1 on wordDelimiter or use PatternTokenizerFactory to split on commas http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternTokenizerFactory you can use the analysis page to see what your filter chains are going to do before you index /admin/analysis.jsp

Re: federated / meta search

2010-06-17 Thread Joe Calderon
yes, you can use distributed search across shards with different schemas as long as the query only references overlapping fields, i usually test adding new fields or tokenizers on one shard and deploy only after i verified its working properly On Thu, Jun 17, 2010 at 1:10 PM, Markus Jelsma wrote:

Re: DismaxRequestHandler

2010-06-17 Thread Joe Calderon
see yonik's post on nested queries http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/ so for example i thought you could possibly do a dismax query across the main fields (in this case just title) and OR that with _query_:"{!description:'oil spill'~4}" On Thu, Jun 17, 2010 at

Re: Exact match on a filter

2010-06-17 Thread Joe Calderon
use a copyField and index the copy as type string, exact matches on that field should then work as the text wont be tokenized On Thu, Jun 17, 2010 at 3:13 PM, Pete Chudykowski wrote: > Hi, > > I'm trying with no luck to filter on the exact-match value of a field. > Speciffically: >  fq=brand:appl

Re: DismaxRequestHandler

2010-06-17 Thread Joe Calderon
the qs parameter affects matching , but you have to wrap your query in double quotes,ex q="oil spill"&qf=title description&qs=4&defType=dismax im not sure how to formulate such a query to apply that rule just to description, maybe with nested queries ... On Thu, Jun 17, 2010 at 12:01 PM, Blargy

Re: how to have "shards" parameter by default

2010-06-10 Thread Joe Calderon
youve created an infinite loop, the shard you query calls all other shards and itself and so on, create a separate requestHandler and query that, ex localhost:7500/solr,localhost:7501/solr,localhost:7502/solr,localhost:7503/solr,localhost:7504/solr,localhost:7505/solr,localhost:7506/

Re: Field Collapsing: How to estimate total number of hits

2010-05-12 Thread Joe Calderon
dont know if its the best solution but i have a field i facet on called type its either 0,1, combined with collapse.facet=before i just sum all the values of the facet field to get the total number found if you dont have such a field u can always add a field with a single value --joe On Wed, May

synonym filter and offsets

2010-04-19 Thread Joe Calderon
hello *, im having issues with the synonym filter altering token offsets, my input text is "saturday night live" its is tokenized by the whitespace tokenizer yielding 3 tokens [saturday, 0,8], [night, 9, 14], [live, 15,19] on indexing these are passed through a synonym filter that has this line s

Re: highlighter issue

2010-04-02 Thread Joe Calderon
ing the RemoveDuplicatesTokenFilter(Factory) do the trick here? > >        Erik > > On Apr 2, 2010, at 4:13 PM, Joe Calderon wrote: > >> hello *, i have a field that is indexing the string "the >> ex-girlfriend" as these tokens: [the, exgirlfriend, ex, gi

highlighter issue

2010-04-02 Thread Joe Calderon
hello *, i have a field that is indexing the string "the ex-girlfriend" as these tokens: [the, exgirlfriend, ex, girlfriend] then they are passed to the edgengram filter, this allows me to match different user spellings and allows for partial highlighting, however a token like 'ex' would get genera

how to create this highlighter behaviour

2010-03-29 Thread Joe Calderon
hello *, ive been using the highlighter and been pretty happy with its results, however theres an edge case im not sure how to fix for query: amazing grace the record matched and highlighted is amazing rendition of amazing grace is there any way to only highlight amazing grace without using phr

Re: Need help in deploying the modified SOLR source code

2010-03-12 Thread Joe Calderon
do `ant clean dist` within the solr source and use the resulting war file, though in the future you might think about extending the built in parser and creating a parser plugin rather than modifying the actual sources see http://wiki.apache.org/solr/SolrPlugins#QParserPlugin for more info --jo

Re: Highlighting

2010-03-10 Thread Joe Calderon
no problem with the query. > > But from what I believe it should wrap around the text in the result. > > So if I search ie Andrew  within the return content Ie would have the > contents with the word Andrew > > and hl.fl=attr_content > > Thank you for you help > >

Re: Highlighting

2010-03-10 Thread Joe Calderon
just to make sure were on the same page, youre saying that the highlight section of the response is empty right? the results section is never highlighted but a separate section contains the highlighted fields specified in hl.fl= On Wed, Mar 10, 2010 at 5:23 AM, Ahmet Arslan wrote: > > >> Yes Cont

Re: Highlighting

2010-03-09 Thread Joe Calderon
did u enable the highlighting component in solrconfig.xml? try setting debugQuery=true to see if the highlighting component is even being called... On Tue, Mar 9, 2010 at 12:23 PM, Lee Smith wrote: > Hey All > > I have indexed a whole bunch of documents and now I want to search against > them. >

Re: indexing a huge data

2010-03-05 Thread Joe Calderon
ive found the csv update to be exceptionally fast, though others enjoy the flexibility of the data import handler On Fri, Mar 5, 2010 at 10:21 AM, Mark N wrote: > what should be the fastest way to index a documents , I am indexing huge > collection of data after extracting certain meta - data inf

Re: Issue on stopword list

2010-03-02 Thread Joe Calderon
or you can try the commongrams filter that combines tokens next to a stopword On Tue, Mar 2, 2010 at 6:56 AM, Walter Underwood wrote: > Don't remove stopwords if you want to search on them. --wunder > > On Mar 2, 2010, at 5:43 AM, Erick Erickson wrote: > >> This is a classic problem with Stopword

Re: Search Result differences Standard vs DisMax

2010-03-01 Thread Joe Calderon
what are you using for the mm parameter? if you set it to 1 only one word has to match, On 03/01/2010 05:07 PM, Steve Reichgut wrote: ***Sorry if this was sent twice. I had connection problems here and it didn't look like the first time it went out I have been testing out results for some

Re: Solr 1.4 distributed search configuration

2010-02-26 Thread Joe Calderon
you can set a default shard parameter on the request handler doing distributed search, you can set up two different request handlers one with shards default and one without On Thu, Feb 25, 2010 at 1:35 PM, Jeffrey Zhao wrote: > Now I got it, just forgot put qt=search in query. > > By the way, in

Re: Changing term frequency according to value of one of the fields

2010-02-26 Thread Joe Calderon
extend the similarity class, compile it against the jars in lib, put in a path solr can find and set your schema to use it http://wiki.apache.org/solr/SolrPlugins#Similarity On 02/25/2010 10:09 PM, Pooja Verlani wrote: Hi, I want to modify Similarity class for my app like the following- Right no

Re: Autosuggest/Autocomplete with solr 1.4 and EdgeNGrams

2010-02-24 Thread Joe Calderon
i had to create a autosuggest implementation not too long ago, originally i was using faceting, where i would match wildcards on a tokenized field and facet on an unaltered field, this had the advantage that i could do everything from one index, though it was also limited by the fact suggestions ca

Re: including 'the' dismax query kills results

2010-02-18 Thread Joe Calderon
use the common grams filter, itll create tokens for stop words and their adjacent terms On Thu, Feb 18, 2010 at 7:16 AM, Nagelberg, Kallin wrote: > I've noticed some peculiar behavior with the dismax searchhandler. > > In my case I'm making the search "The British Open", and am getting 0 > resul

Re: Reindex after changing defaultSearchField?

2010-02-17 Thread Joe Calderon
no, youre just changing how your querying the index, not the actual index, you will need to restart the servlet container or reload the core for the config changes to take effect tho On 02/17/2010 10:04 AM, Frederico Azeiteiro wrote: Hi, If i change the "defaultSearchField" in the core schem

Re: and DisMaxRequestHandler

2010-02-15 Thread Joe Calderon
no but you can set a default for the qf parameter with the same value On 02/15/2010 01:50 AM, Steve Radhouani wrote: Hi there, Can the option be used by the DisMaxRequestHandler? Thanks, -Steve

Re: problem with edgengramtokenfilter and highlighter

2010-02-14 Thread Joe Calderon
lucene-2266 filed and patch posted. On 02/13/2010 09:14 PM, Robert Muir wrote: Joe, can you open a Lucene JIRA issue for this? I just glanced at the code and it looks like a bug to me. On Sun, Feb 14, 2010 at 12:07 AM, Joe Calderonwrote: i ran into a problem while using the edgengramtoken

problem with edgengramtokenfilter and highlighter

2010-02-13 Thread Joe Calderon
i ran into a problem while using the edgengramtokenfilter, it seems to report incorrect offsets when generating tokens, more specifically all the tokens have offset 0 and term length as start and end, this leads to goofy highlighting behavior when creating edge grams for tokens beyond the first one

reloading sharedlib folder

2010-02-12 Thread Joe Calderon
when using solr.xml, you can specify a sharedlib directory to share among cores, is it possible to reload the classes in this dir without having to restart the servlet container? it would be useful to be able to make changes to those classes on the fly or be able to drop in new plugins

Re: How to reindex data without restarting server

2010-02-11 Thread Joe Calderon
if you use the core model via solr.xml you can reload a core without having to to restart the servlet container, http://wiki.apache.org/solr/CoreAdmin On 02/11/2010 02:40 PM, Emad Mushtaq wrote: Hi, I would like to know if there is a way of reindexing data without restarting the server. Lets sa

Re: analysing wild carded terms

2010-02-10 Thread Joe Calderon
sorry, what i meant to say is apply text analysis to the part of the query that is wildcarded, for example if a term with latin1 diacritics is wildcarded ide still like to run it through ISOLatin1Filter On Wed, Feb 10, 2010 at 4:59 AM, Fuad Efendi wrote: >> hello *, quick question, what would i h

Re: question/suggestion for Solr-236 patch

2010-02-10 Thread Joe Calderon
you can do that very easily yourself in a post processing step after you receive the solr response On Wed, Feb 10, 2010 at 8:12 AM, gdeconto wrote: > > I have been able to apply and use the solr-236 patch (field collapsing) > successfully. > > Very, very cool and powerful. > > My one comment/conc

analysing wild carded terms

2010-02-09 Thread Joe Calderon
hello *, quick question, what would i have to change in the query parser to allow wildcarded terms to go through text analysis?

Re: old wildcard highlighting behaviour

2010-02-06 Thread Joe Calderon
On iPhone so don't remember exact param I named it, but check wiki - > something like hl.highlightMultiTerm - set it to false. > > - Mark > > http://www.lucidimagination.com (mobile) > > On Feb 6, 2010, at 12:00 AM, Joe Calderon wrote: > >> hello *, currentl

old wildcard highlighting behaviour

2010-02-05 Thread Joe Calderon
hello *, currently with hl.usePhraseHighlighter=true, a query for (joe jack*) will highlight joe jackson, however after reading the archives, what im looking for is the old 1.1 behaviour so that only joe jack is highlighted, is this possible in solr 1.5 ? thx much --joe

source tree for lucene

2010-02-04 Thread Joe Calderon
i want to recompile lucene with http://issues.apache.org/jira/browse/LUCENE-2230, but im not sure which source tree to use, i tried using the implied trunk revision from the admin/system page but solr fails to build with the generated jars, even if i exclude the patches from 2230... im wondering i

fuzzy matching / configurable distance function?

2010-02-04 Thread Joe Calderon
is it possible to configure the distance formula used by fuzzy matching? i see there are other under the function query page under strdist but im wondering if they are applicable to fuzzy matching thx much --joe

Re: distributed search and failed core

2010-02-03 Thread Joe Calderon
a shard has failed On Wed, Feb 3, 2010 at 10:55 AM, Yonik Seeley wrote: > On Fri, Jan 29, 2010 at 3:31 PM, Joe Calderon wrote: >> hello *, in distributed search when a shard goes down, an error is >> returned and the search fails, is there a way to avoid the error and >> re

Re: Basic indexing question

2010-02-02 Thread Joe Calderon
rch will have to collate associated > information into a presentable screen anyhow - so I'm not too worried about > info being returned by Solr as such) > > Would that be a reasonable way of using Solr > > > > > -Original Message- > From: Joe Calderon [mailto:

Re: Basic indexing question

2010-02-02 Thread Joe Calderon
by default solr will only search the default fields, you have to either query all fields field1:(ore) or field2:(ore) or field3:(ore) or use a different query parser like dismax On Tue, Feb 2, 2010 at 3:31 PM, Stefan Maric wrote: > I have got a basic configuration of Solr up and running and have

distributed search and failed core

2010-01-29 Thread Joe Calderon
hello *, in distributed search when a shard goes down, an error is returned and the search fails, is there a way to avoid the error and return the results from the shards that are still up? thx much --joe

Re: index of facet fields are not same as original string

2010-01-28 Thread Joe Calderon
facets are based off the indexed version of your string nor the stored version, you probably have an analyzer thats removing punctuation, most people index the same field multiple ways for different purposes, matching. storting, faceting etc... index a copy of your field as string type and facet o

Re: create requesthandler with default shard parameter for different query parser

2010-01-21 Thread Joe Calderon
main reason im creating the new request handler, or do i put them all as defaults under my new request handler and let the query parser use whichever ones it supports? On Thu, Jan 21, 2010 at 11:45 AM, Yonik Seeley wrote: > On Thu, Jan 21, 2010 at 2:39 PM, Joe Calderon wrote: >> hello *

create requesthandler with default shard parameter for different query parser

2010-01-21 Thread Joe Calderon
hello *, what is the best way to create a requesthandler for distributed search with a default shards parameter but that can use different query parsers thus far i have *,score json host0:8080/solr/core0,host1:8080/solr/core1,host2:8080/solr/core2,localhost:8080

Re: Field collapsing patch error

2010-01-19 Thread Joe Calderon
this has come up before, my suggestions would be to use the 12/24 patch with trunk revision 892336 http://www.lucidimagination.com/search/document/797549d29e1810d9/solr_1_4_field_collapsing_what_are_the_steps_for_applying_the_solr_236_patch 2010/1/19 Licinio Fernández Maurelo : > Hi folks, > > i'

Re: question about date boosting

2010-01-12 Thread Joe Calderon
I think you need to use the new trieDateField On 01/12/2010 07:06 PM, Daniel Higginbotham wrote: Hello, I'm trying to boost results based on date using the first example here:http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents However, I'm getting an er

Re: Solr 1.4 Field collapsing - What are the steps for applying the SOLR-236 patch?

2010-01-11 Thread Joe Calderon
it seems to be in flux right now as the solr developers slowly make improvements and ingest the various pieces into the solr trunk, i think your best bet might be to use the 12/24 patch and fix any errors where it doesnt apply cleanly im using solr trunk r892336 with the 12/24 patch --joe On

Re: help implementing a couple of business rules

2010-01-11 Thread Joe Calderon
matches sorry if i was unclear --joe On Mon, Jan 11, 2010 at 10:13 AM, Erik Hatcher wrote: > > On Jan 11, 2010, at 12:56 PM, Joe Calderon wrote: >> >> 1. given a set of fields how to return matches that match across them >> but not just one specific one, ex im using a

help implementing a couple of business rules

2010-01-11 Thread Joe Calderon
hello *, im looking for help on writing queries to implement a few business rules. 1. given a set of fields how to return matches that match across them but not just one specific one, ex im using a dismax parser currently but i want to exclude any results that only match against a field called 'd

custom wildcarding in qparser

2010-01-08 Thread Joe Calderon
hello *, what do i need to do to make a query parser that works just like the standard query parser but also runs analyzers/tokenizers on a wildcarded term, specifically im looking to only wildcarding the last token ive tried the edismax qparser and the prefix qparser and neither is exactly what i

Re: analyzer type="query" with NGramTokenFilterFactory forces phrase query

2009-12-31 Thread Joe Calderon
"if this is the expected behaviour is there a way to override it?"[1] [1] me On Thu, Dec 31, 2009 at 10:13 AM, AHMET ARSLAN wrote: >> Hello *, im trying to make an index >> to support spelling errors/fuzzy >> matching, ive indexed my document titles with >> NGramFilterFactory >> minGramSize=2 ma

analyzer type="query" with NGramTokenFilterFactory forces phrase query

2009-12-31 Thread Joe Calderon
Hello *, im trying to make an index to support spelling errors/fuzzy matching, ive indexed my document titles with NGramFilterFactory minGramSize=2 maxGramSize=3, using the analysis page i can see the common grams match between the indexed value and the query value, however when i try to do a query

score = result of function query

2009-12-30 Thread Joe Calderon
how can i make the score be solely the output of a function query? the function query wiki page details something like q=boxname:findbox+_val_:"product(product(x,y),z)"&fl=*,score but that doesnt seems to work --joe

boosting on string distance

2009-12-29 Thread Joe Calderon
hello *, i want to boost documents that match the query better, currently i also index my field as a string an boost if i match the string field but im wondering if its possible to boost with bf parameter with a formula using the function strdist(), i know one of the columns would be the field nam

Re: SOLR Performance Tuning: Pagination

2009-12-24 Thread Joe Calderon
fwiw, when implementing distributed search i ran into a similar problem, but then i noticed even google doesnt let you go past page 1000, easier to just set a limit on start On Thu, Dec 24, 2009 at 8:36 AM, Walter Underwood wrote: > When do users do a query like that? --wunder > > On Dec 24, 200

wildcard oddity

2009-12-15 Thread Joe Calderon
im trying to do a wild card search "q":"item_title:(gets*)"returns no results "q":"item_title:(gets)"returns results "q":"item_title:(get*)"returns results seems like * at the end of a token is requiring a character, instead of being 0 or more its acting like1 or more the text im tr

Re: apply a patch on solr

2009-11-03 Thread Joe Calderon
sorry got cut off, patch, then ant clean dist, will give you the modified solr war file, if it doesnt apply cleanly (which i dont think is currently the case), you can go back to the latest revision referenced in the patch, On Tue, Nov 3, 2009 at 8:17 PM, Joe Calderon wrote: > patch -p0 <

Re: apply a patch on solr

2009-11-03 Thread Joe Calderon
patch -p0 < /path/to/field-collapse-5.patch On Tue, Nov 3, 2009 at 7:48 PM, michael8 wrote: > > Hmmm, perhaps I jumped the gun.  I just looked over the field collapse patch > for SOLR-236 and each file listed in the patch has its own revision #. > > E.g. from field-collapse-5.patch: > --- src/jav

tokenize after filters

2009-11-02 Thread Joe Calderon
is it possible to tokenize a field on whitespace after some filters have been applied: ex: "A + W Root Beer" the field uses a keyword tokenizer to keep the string together, then it will get converted to "aw root beer" by a custom filter ive made, i now want to split that up into 3 tokens (aw, roo

faceting ordering

2009-10-28 Thread Joe Calderon
curious...is it possible to have faceted results ordered by score? im having a problem where im faceting on a field while searching for the same word twice, for example: im searching for "the the" on a tokenized field and faceting by the untokenized version, faceting returns records with "the the

field collapsing exception

2009-10-26 Thread Joe Calderon
found another exception, i cant find specific steps to reproduce besides starting with an unfiltered result and then given an int field with values (1,2,3) filtering by 3 triggers it sometimes, this is in an index with very frequent updates and deletes --joe java.lang.NullPointerException

profiling solr

2009-10-26 Thread Joe Calderon
as a curiosity ide like to use a profiler to see where within solr queries spend most of their time, im curious what tools if any others use for this type of task.. im using jetty as my servlet container so ideally ide like a profiler thats compatible with it --joe

field collapsing bug (java.lang.ArrayIndexOutOfBoundsException)

2009-10-23 Thread Joe Calderon
seems to happen when sort on anything besides strictly score, even score desc, num desc triggers it, using latest nightly and 10/14 patch Problem accessing /solr/core1/select. Reason: 4731592 java.lang.ArrayIndexOutOfBoundsException: 4731592 at org.apache.lucene.search.FieldComparat

boostQParser and dismax

2009-10-22 Thread Joe Calderon
hello *, i was just reading over the wiki function query page and found this little gem for boosting recent docs thats much better than what i was doing before recip(ms(NOW,mydatefield),3.16e-11,1,1) my question is, at the bottom it says The most effective way to use such a boost is to multiply

Re: max words/tokens

2009-10-20 Thread Joe Calderon
cool np, i just didnt want to duplicate code if that already existed. On Tue, Oct 20, 2009 at 12:49 PM, Yonik Seeley wrote: > On Tue, Oct 20, 2009 at 1:53 PM, Joe Calderon wrote: >> i have a pretty basic question, is there an existing analyzer that >> limits the number of words

max words/tokens

2009-10-20 Thread Joe Calderon
i have a pretty basic question, is there an existing analyzer that limits the number of words/tokens indexed from a field? let say i only wanted to index the top 25 words... thx much --joe

lucene 2.9 bug

2009-10-16 Thread Joe Calderon
hello * , ive read in other threads that lucene 2.9 had a serious bug in it, hence trunk moved to 2.9.1 dev, im wondering what the bug is as ive been using the 2.9.0 version for the past weeks with no problems, is it critical to upgrade? --joe

how to get field contents out of Document object

2009-10-14 Thread Joe Calderon
hello *, sorry if this seems like a dumb question, im still fairly new to working with lucene/solr internals. given a Document object, what is the proper way to fetch an integer value for a field called "num_in_stock", it is both indexed and stored thx much --joe

Re: Solr 1.4 release candidate

2009-10-14 Thread Joe Calderon
maybe im just not familiar with the way the version numbers works in trunk but when i build the latest nightly the jars have names like *-1.5-dev.jar, is that normal? On Wed, Oct 14, 2009 at 7:01 AM, Yonik Seeley wrote: > Folks, we've been in code freeze since Monday and a test release > candida

concatenating tokens

2009-10-08 Thread Joe Calderon
hello *, im using a combination of tokenizers and filters that give me the desired tokens, however for a particular field i want to concatenate these tokens back to a single string, is there a filter to do that, if not what are the steps needed to make my own filter to concatenate tokens? for exam

Re: stats page slow in latest nightly

2009-10-06 Thread Joe Calderon
dn't bring it up when Hoss made my > life easier with his simpler patch. > > Yonik Seeley wrote: >> Might be the new Lucene fieldCache stats stuff that was recently added? >> >> -Yonik >> http://www.lucidimagination.com >> >> >> On Tue, Oct 6, 2

stats page slow in latest nightly

2009-10-06 Thread Joe Calderon
hello *, ive been noticing that /admin/stats.jsp is really slow in the recent builds, has anyone else encountered this? --joe

Re: JVM OOM when using field collapse component

2009-10-02 Thread Joe Calderon
M when having an > index of a few million. > > Martijn > > 2009/10/2 Joe Calderon : >> i gotten two different out of memory errors while using the field >> collapsing component, using the latest patch (2009-09-26) and the >> latest nightly, >> >> has anyone el

JVM OOM when using field collapse component

2009-10-01 Thread Joe Calderon
i gotten two different out of memory errors while using the field collapsing component, using the latest patch (2009-09-26) and the latest nightly, has anyone else encountered similar problems? my collection is 5 million results but ive gotten the error collapsing as little as a few thousand SEVE

Re: field collapsing sums

2009-10-01 Thread Joe Calderon
thx for the reply, i just want the number of dupes in the query result, but it seems i dont get the correct totals, for example a non collapsed dismax query for belgian beer returns X number results but when i collapse and sum the number of docs under collapse_counts, its much less than X it does

Re: field collapsing sums

2009-10-01 Thread Joe Calderon
ks, >> >> Matt Weber >> >> On Sep 30, 2009, at 5:16 PM, Uri Boness wrote: >> >>> Hi, >>> >>> At the moment I think the most appropriate place to put it is in the >>> AbstractDocumentCollapser (in the getCollapseInfo method). Though,

changing dismax parser to not treat symbols differently

2009-09-30 Thread Joe Calderon
how would i go about modifying the dismax parser to treat +/- as regular text?

field collapsing sums

2009-09-30 Thread Joe Calderon
hello all, i have a question on the field collapsing patch, say i have an integer field called "num_in_stock" and i collapse by some other column, is it possible to sum up that integer field and return the total in the output, if not how would i go about extending the collapsing component to suppor

boost function for date as unix stamp

2009-09-25 Thread Joe Calderon
hello *, i read on the wiki about using recip(rord(...)...) to boost recent documents with a date field, does anyone have a good function for doing something similar with unix timestamps? if not, is there a lot of overhead related to counting the number of distinct values for rord() ? thx much

Re: KStem download

2009-09-14 Thread Joe Calderon
is the source for the lucid kstemmer available ? from the lucid solr package i only found the compiled jars On Mon, Sep 14, 2009 at 11:04 AM, Yonik Seeley wrote: > On Mon, Sep 14, 2009 at 1:56 PM, darniz wrote: >> Pascal Dimassimo wrote: >>> >>> Hi, >>> >>> I want to try KStem. I'm following the

query parser question

2009-09-10 Thread Joe Calderon
i have field called text_stem that has a kstemmer on it, im having trouble matching wildcard searches on a word that got stemmed for example i index the word "america's", which according to analysis.jsp after stemming gets indexed as "america" when matching i do a query like myfield:(ame*) which

help with solr.PatternTokenizerFactory

2009-09-09 Thread Joe Calderon
hello *, im not sure what im doing wrong i have this field defined in schema.xml, using admin/analysis.jsp its working as expected, but when i try to update via csvhandler i get Error 500 org.apache.solr.analysis.PatternTokeni

Re: Geographic clustering

2009-09-08 Thread Joe Calderon
there are clustering libraries like http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/, that have bindings to perl/python, you can preprocess your results and create clusters for each zoom level On Tue, Sep 8, 2009 at 8:08 AM, gwk wrote: > Hi, > > I just completed a simple proof-of-concept

stemming plurals

2009-09-04 Thread Joe Calderon
i saw some post regarding stemming plurals in the archives from 2008, i was wondering if this was ever integrated or if custom hackery is still needed, is there something like a stemplurals analyzer is the kstemmer the closest thing? thx much --joe

score = sum of boosts

2009-09-02 Thread Joe Calderon
hello *, what would be the best approach to return the sum of boosts as the score? ex: a dismax handler boosts matches to field1^100 and field2^50, a query matches both fields hence the score for that row would be 150 is this something i could do with a function query or do i need to hack up Di

Re: Responses getting truncated

2009-08-28 Thread Joe Calderon
yonik has a point, when i ran into this i also upgraded to the latest stable jetty, im using jetty 6.1.18 On 08/28/2009 04:07 PM, Rupert Fiasco wrote: I deployed LucidWorks with my existing solrconfig / schema and re-indexed my data into it and pushed it out to production, we'll see how it stac

Re: Responses getting truncated

2009-08-28 Thread Joe Calderon
i had a similar issue with text from past requests showing up, this was on 1.3 nightly, i switched to using the lucid build of 1.3 and the problem went away, im using a nightly of 1.4 right now also without probs, then again your mileage may vary as i also made a bunch of schema changes that mi

Re: non-exhaustive results?

2009-08-28 Thread Joe Calderon
facet.mincount=1, facet.limit=-1 On 08/28/2009 09:15 AM, Candide Kemmler wrote: Sorry, I have misinterpreted my test results. In fact, I can see that facets are missing in the original search. So the question becomes: how is it possible that a search doesn't report all the facets of a specific r

extended documentation on analyzers

2009-08-27 Thread Joe Calderon
is there an online resource or a book that contains a thorough list of tokenizers and filters available and their functionality? http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters is very helpful but i would like to go through additional filters to make sure im not reinventing the wheel

shingle filter

2009-08-24 Thread Joe Calderon
hello *, im currently faceting on a shingled field to obtain popular phrases and its working well, however ide like to limit the number of shingles that get created, the solr.ShingleFilterFactory supports maxShingleSize, can it be made to support a minimum as well? can someone point me in the right

where to get solr 1.4 nightly

2009-08-20 Thread Joe Calderon
i want to try out the improvements in 1.4 but the nightly site is down http://people.apache.org/builds/lucene/solr/nightly/ is there a mirror for nightlies? --joe

Re: dealing with duplicates

2009-08-10 Thread Joe Calderon
ELECT id FROM videos WHERE title LIKE 'family guy' AND desc LIKE 'stewie%' AND is_dup = 0 ) ) ) ORDER BY views LIMIT 10 can a similar query be written in lucene or do i need to structure my index differently to be able to do such a query? thx much --joe On S

concurrent csv loading

2009-08-06 Thread Joe Calderon
for first time loads i currently post to /update/csv?commit=false&separator=%09&escape=\&stream.file=workfile.txt&map=NULL:&keepEmpty=false", this works well and finishes in about 20 minutes for my work load. this is mostly cpu bound, i have an 8 core box and it seems one takes the brunt of the wo

Re: dealing with duplicates

2009-08-01 Thread Joe Calderon
nd didn't have flagged duplicates in the first place?  If so, have > you tried using http://wiki.apache.org/solr/Deduplication ? > >  Otis > -- > Sematext is hiring -- http://sematext.com/about/jobs.html?mls > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR >

dealing with duplicates

2009-07-31 Thread Joe Calderon
hello all, i have a collection of a few million documents; i have many duplicates in this collection. they have been clustered with a simple algorithm, i have a field called 'duplicate' which is 0 or 1 and a fields called 'description, tags, meta', documents are clustered on different criteria and