Re: FastVectorHighlighter wiki corrections
Hi Mike, (12/01/11 16:14), Michael Lissner wrote: - I need help with fragsize. The wiki says to set it to either 0 or a huge number to disable fragmenting. Which is it? It is original Highlighter. - the wiki says that hl.useFastVectorHighlighter is defaulted to false. I read somewhere that FVH is True when the data has been indexed with termVectors, termPositions and termOffsets. Is that correct? Not correct. To use FVH, you need to set to true hl.useFastVectorHighlighter parameter at query time, and index the highlighting fields with termVectors, termPositions and termOffsets. koji -- http://www.rondhuit.com/en/
Re: best query for one-box search string over multiple types & fields?
Johnny What you are going to want to do is boost the artist field with respect to the others, for example using edismax my 'qf' parameter is: number^5 title^3 default so hits in the number field get a five-fold boost and hits in the title field get a three-fold boost. In your case you might want to start with: artist^5 album^3 song Getting these parameters right will take a little work, and I would suggest you build a set of searches with known results so you can quickly check the effect of any tweaks you do. Useful reading would include: http://wiki.apache.org/solr/SolrRelevancyFAQ http://wiki.apache.org/solr/SolrRelevancyCookbook http://www.lucidimagination.com/blog/2011/12/14/options-to-tune-document’s-relevance-in-solr/ http://www.lucidimagination.com/blog/2011/03/10/solr-relevancy-function-queries/ Cheers François On Jan 15, 2012, at 1:19 AM, Johnny Marnell wrote: > hi all, > > short of it: i want "queen bohemian rhapsody" to return that song named > "Bohemian Rhapsody" by the artist named "Queen", rather than songs with > titles like "Bohemian Rhapsody (Queen Cover)". > > i'm indexing a catalog of music with these types of docs and their fields: > > artist (artistName), album (albumName, artistName), and song (songName, > albumName, artistName). > > the client is one search box, and i'm having trouble handling searching > over multiple multifields and weighting their exactness. when a user types > "queen", i want the artist Queen to be the first hit, and then albums & > songs titled "queen". > > if "queen bohemian rhapsody" is searched, i want to return that song, but > instead i'm getting songs like "Bohemian Rhapsody (Queen Cover)" by "Stupid > Queen Tribute Band" because all three terms are in the songName, i'm > guessing. what kind of query do i need? > > i'm indexing all of these fields as multi-fields with ngram, shingle (i > think this might be really useful for my use case?), keyword, and standard. > that appears to be working, but i'm not sure how to combine all of this > together over multiple multi-fields. > > if anyone has good links to broadly summarized use cases of Indexing and > Querying, that would be great - i would think this would be a common > situation but i can't find any good resources on the web. and i'm having > trouble understanding scoring and boosting. > > this was my first post, hope i did it right, thanks so much! > > -j
Re: Faceting Question
> Does > that make more sense? Ah I see. I'm not certain but take a look at pivot faceting https://issues.apache.org/jira/browse/SOLR-792 cheers lee c
Re: Determining which shard is failing using partialResults / some other technique?
Hi, There are a couple ways of handling this. One is to do it from the 'client' side - i.e. do a Solr ping to each shard beforehand to find out which/if any shards are unavailable. This may not always work if you use forwarders/proxies etc. What we do is add the name of all failed shards to the CommonParams.FAILED_SHARDS parameter in the response header (if partialResults=true), by retrieving the current list (if any) and appending: Excerpt from SearchHandler.java : handleRequestBody(): [code] log.info("Waiting for shard replies..."); // now wait for replies, but if anyone puts more requests on // the outgoing queue, send them out immediately (by exiting // this loop) while (rb.outgoing.size() == 0) { ShardResponse srsp = comm.takeCompletedOrError(); if (srsp == null) break; // no more requests to wait for // If any shard does not respond (ConnectException) we respond with // other shards and set partialResults to true for (ShardResponse shardRsp : srsp.getShardRequest().responses) { Throwable th = shardRsp.getException(); if (th != null) { log.info("Got shard exception for: " + srsp.getShard() + " : " + th.getClass().getName() + " cause: " + th.getCause()); if (th instanceof SolrServerException && th.getCause() instanceof Exception) { // Was there an exception and return partial results is false? If so, abort everything and rethrow if (failOnShardFailure) { log.info("Not set for partial results. Aborting..."); comm.cancelAll(); throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, th); } if(rsp.getResponseHeader().get(CommonParams.FAILED_SHARDS) == null) { rsp.getResponseHeader().add(CommonParams.FAILED_SHARDS, shardRsp.getShard() + "|" + (srsp.getException() != null && srsp.getException().getCause() != null ? srsp.getException().getCause().getClass().getSimpleName() : (th instanceof SolrServerException && th.getCause() != null ? th.getCause().getClass().getSimpleName() : th.getClass().getSimpleName(; } else { //Append the name of the failed shard, delimiting multiple failed shards with | String prslt = rsp.getResponseHeader().get(CommonParams.FAILED_SHARDS).toString(); prslt += ";" + shardRsp.getShard() + "|" + (srsp.getException() != null && srsp.getException().getCause() != null ? srsp.getException().getCause().getClass().getSimpleName() : (th instanceof SolrServerException && th.getCause() != null ? th.getCause().getClass().getSimpleName() : th.getClass().getSimpleName())); rsp.getResponseHeader().remove(CommonParams.FAILED_SHARDS); rsp.getResponseHeader().add(CommonParams.FAILED_SHARDS, prslt); } log.error("Connection to shard [" + shardRsp.getShard() + "] did not succeed", th.getCause()); } else { comm.cancelAll(); if (th instanceof SolrException) { throw (SolrException) th; } else { throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, srsp.getException()); } } } } rb.finished.add(srsp.getShardRequest()); [/code] [Note we also log the failure to the [local] server's log] Your client can then extract the CommonParams.FAILED_SHARDS parameter and display and/or process accordingly.
Re: Faceting Question
Hi, It's quite coincidental that I was just about to ask this very question to the forum experts. I think this is the same sort of thing Jamie was asking about. (the only difference in my question is that the values won't be known at query time) Is it possible to create a request that will return *multiple* facet ranges - 1 for each value of a given field? (ideally, up to some facet.limit) For example: Let's say you query: user:* AND timestamp:[yesterday TO now], with a facet field of 'user'. Let's now say the faceting returns a count of 50, and there are 5 different values for 'user' - let's say user1, user2, user3, user4 and user5 (50 things happened over the last 24 hours by 5 different users). Is it possible, in a single query, to get back 5 facet ranges over the 24hr period - one for each user? Or, do you simply have to do the search, and then iterate through each value returned and date facet on that? Pivot faceting can give results for combinations of multiple facets, but not ranges. Thanks, Peter On Sun, Jan 15, 2012 at 3:30 PM, Lee Carroll wrote: >> Does >> that make more sense? > > Ah I see. > > I'm not certain but take a look at pivot faceting > > https://issues.apache.org/jira/browse/SOLR-792 > > cheers lee c
RE: GermanAnalyzer
> > What is an equivalent fieldType definition in Solr 3.5? > > > > OK, and if I would reindex, is this still the best practice config for german text?
Re: Getting started with indexing a database
Hi Mike, Can you try removing ' from the nested entities? Just keep it in the top level entity. Regards, Rakesh Varna On Wed, Jan 11, 2012 at 7:26 AM, Gora Mohanty wrote: > On Tue, Jan 10, 2012 at 7:09 AM, Mike O'Leary wrote: > [...] > > My data-config.xml file looks like this: > > > > > > > url="jdbc:mysql://localhost:3306/bioscope" user="db_user" > password=""/> > > > > >deltaQuery="SELECT doc_id FROM bioscope.docs where > last_modified > '${dataimporter.last_index_time}'"> > > > > > > Your SELECT above does not include the field "type" > > >^^ This should be: WHERE id=='${docs.doc_id}' as 'id' is > what >you are selecting in this entity. > > Same issue for the second nested entity, i.e., replace doc_id= with id= > > Regards, > Gora >
Re: xpathentityprocessor with flattern true
Try using flatten="true" in the rather than the . Note that it will remove all child node names, and will only concatenate the text values of the child nodes. example: abc def ghi>/id> will concatenate abc, def, ghi to give a single text value. Note that xpath terminates at Regards, Rakesh Varna On Mon, Jan 9, 2012 at 8:32 AM, vrpar...@gmail.com wrote: > am i making any mistake with xpathentityprocessor? > > i am using solr 1.4 > > please help me to solve this problem? > > > > Thanks & Regards, > Vishal Parekh > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/xpathentityprocessor-with-flattern-true-tp3637928p3645013.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Synonym configuration not working?
Yes and No. If using Synonyms funtionality out of the box you have to do it at index time. But if using it at query time, like we do, you have to do some programming. We have connected a thesaurus which is actually using synonyms functionality at query time. There are some pitfalls to take care of. Bernd Am 15.01.2012 07:07, schrieb Michael Lissner: Just replying for others in the future. The answer to this is to do synonyms at index time, not at query time. Mike On Fri 06 Jan 2012 02:35:23 PM PST, Michael Lissner wrote: I'm trying to set up some basic synonyms. The one I've been working on is: us, usa, united states My understanding is that adding that to the synonym file will allow users to search for US, and get back documents containing usa or united states. Ditto for if a user puts in usa or united states. Unfortunately, with this in place, when I do a search, I get the results for items that contain all three of the words - it's doing an AND of the synonyms rather than an OR. If I turn on debugging, this is indeed what I see (plus some stemming): (+DisjunctionMaxQuery(((westCite:us westCite:usa westCite:unit) | (text:us text:usa text:unit) | (docketNumber:us docketNumber:usa docketNumber:unit) | ((status:us status:usa status:unit)^1.25) | (court:us court:usa court:unit) | (lexisCite:us lexisCite:usa lexisCite:unit) | ((caseNumber:us caseNumber:usa caseNumber:unit)^1.25) | ((caseName:us caseName:usa caseName:unit)^1.5/no_coord Am I doing something wrong to cause this? My defaultOperator is set to AND, but I'd expect the synonym filter to understand that. Any help? Thanks, Mike