Re: FastVectorHighlighter wiki corrections

2012-01-15 Thread Koji Sekiguchi

Hi Mike,

(12/01/11 16:14), Michael Lissner wrote:

- I need help with fragsize. The wiki says to set it to either 0 or a huge 
number to disable
fragmenting. Which is it?


It is original Highlighter.


- the wiki says that hl.useFastVectorHighlighter is defaulted to false. I read 
somewhere that FVH is
True when the data has been indexed with termVectors, termPositions and 
termOffsets. Is that correct?


Not correct. To use FVH, you need to set to true hl.useFastVectorHighlighter 
parameter
at query time, and index the highlighting fields with termVectors, 
termPositions and termOffsets.

koji
--
http://www.rondhuit.com/en/


Re: best query for one-box search string over multiple types & fields?

2012-01-15 Thread François Schiettecatte
Johnny 

What you are going to want to do is boost the artist field with respect to the 
others, for example using edismax my 'qf' parameter is:

number^5 title^3 default

so hits in the number field get a five-fold boost and hits in the title field 
get a three-fold boost. In your case you might want to start with:

artist^5 album^3 song

Getting these parameters right will take a little work, and I would suggest you 
build a set of searches with known results so you can quickly check the effect 
of any tweaks you do.

Useful reading would include:

http://wiki.apache.org/solr/SolrRelevancyFAQ

http://wiki.apache.org/solr/SolrRelevancyCookbook


http://www.lucidimagination.com/blog/2011/12/14/options-to-tune-document’s-relevance-in-solr/


http://www.lucidimagination.com/blog/2011/03/10/solr-relevancy-function-queries/

Cheers

François


On Jan 15, 2012, at 1:19 AM, Johnny Marnell wrote:

> hi all,
> 
> short of it: i want "queen bohemian rhapsody" to return that song named
> "Bohemian Rhapsody" by the artist named "Queen", rather than songs with
> titles like "Bohemian Rhapsody (Queen Cover)".
> 
> i'm indexing a catalog of music with these types of docs and their fields:
> 
> artist (artistName), album (albumName, artistName), and song (songName,
> albumName, artistName).
> 
> the client is one search box, and i'm having trouble handling searching
> over multiple multifields and weighting their exactness.  when a user types
> "queen", i want the artist Queen to be the first hit, and then albums &
> songs titled "queen".
> 
> if "queen bohemian rhapsody" is searched, i want to return that song, but
> instead i'm getting songs like "Bohemian Rhapsody (Queen Cover)" by "Stupid
> Queen Tribute Band" because all three terms are in the songName, i'm
> guessing.  what kind of query do i need?
> 
> i'm indexing all of these fields as multi-fields with ngram, shingle (i
> think this might be really useful for my use case?), keyword, and standard.
> that appears to be working, but i'm not sure how to combine all of this
> together over multiple multi-fields.
> 
> if anyone has good links to broadly summarized use cases of Indexing and
> Querying, that would be great - i would think this would be a common
> situation but i can't find any good resources on the web.  and i'm having
> trouble understanding scoring and boosting.
> 
> this was my first post, hope i did it right, thanks so much!
> 
> -j



Re: Faceting Question

2012-01-15 Thread Lee Carroll
>  Does
> that make more sense?

Ah I see.

I'm not certain but take a look at pivot faceting

https://issues.apache.org/jira/browse/SOLR-792

cheers lee c


Re: Determining which shard is failing using partialResults / some other technique?

2012-01-15 Thread Peter Sturge
Hi,

There are a couple ways of handling this.

One is to do it from the 'client' side - i.e. do a Solr ping to each
shard beforehand to find out which/if any shards are unavailable. This
may not always work if you use forwarders/proxies etc.

What we do is add the name of all failed shards to the
CommonParams.FAILED_SHARDS parameter in the response header (if
partialResults=true), by retrieving the current list (if any) and
appending:

Excerpt from SearchHandler.java : handleRequestBody():
[code]
  log.info("Waiting for shard replies...");
  // now wait for replies, but if anyone puts more requests on
  // the outgoing queue, send them out immediately (by exiting
  // this loop)
  while (rb.outgoing.size() == 0) {
ShardResponse srsp = comm.takeCompletedOrError();
if (srsp == null) break;  // no more requests to wait for

// If any shard does not respond (ConnectException) we respond with
// other shards and set partialResults to true
for (ShardResponse shardRsp : srsp.getShardRequest().responses) {
  Throwable th = shardRsp.getException();
  if (th != null) {
log.info("Got shard exception for: " + srsp.getShard()
+ " : " + th.getClass().getName() + " cause: " + th.getCause());
if (th instanceof SolrServerException && th.getCause()
instanceof Exception) {
  // Was there an exception and return partial results
is false?  If so, abort everything and rethrow
  if (failOnShardFailure) {
log.info("Not set for partial results. Aborting...");
comm.cancelAll();
throw new
SolrException(SolrException.ErrorCode.SERVER_ERROR, th);
  }

if(rsp.getResponseHeader().get(CommonParams.FAILED_SHARDS) == null) {

rsp.getResponseHeader().add(CommonParams.FAILED_SHARDS,
shardRsp.getShard() + "|" +
  (srsp.getException() != null &&
srsp.getException().getCause() != null ?

srsp.getException().getCause().getClass().getSimpleName() :
  (th instanceof SolrServerException &&
th.getCause() != null ? th.getCause().getClass().getSimpleName() :
th.getClass().getSimpleName(;
  }
  else {


//Append the name of the failed shard, delimiting
multiple failed shards with |
String prslt =
rsp.getResponseHeader().get(CommonParams.FAILED_SHARDS).toString();
prslt += ";" + shardRsp.getShard() + "|" +
 (srsp.getException() != null &&
srsp.getException().getCause() != null ?

srsp.getException().getCause().getClass().getSimpleName() :
  (th instanceof SolrServerException &&
th.getCause() != null ? th.getCause().getClass().getSimpleName() :
th.getClass().getSimpleName()));
rsp.getResponseHeader().remove(CommonParams.FAILED_SHARDS);

rsp.getResponseHeader().add(CommonParams.FAILED_SHARDS, prslt);
  }
  log.error("Connection to shard [" +
shardRsp.getShard() + "] did not succeed", th.getCause());
} else {
  comm.cancelAll();
  if (th instanceof SolrException) {
throw (SolrException) th;
  } else {
throw new
SolrException(SolrException.ErrorCode.SERVER_ERROR,
srsp.getException());
  }
}
  }
}
rb.finished.add(srsp.getShardRequest());
[/code]

[Note we also log the failure to the [local] server's log]
Your client can then extract the CommonParams.FAILED_SHARDS parameter
and display and/or process accordingly.


Re: Faceting Question

2012-01-15 Thread Peter Sturge
Hi,

It's quite coincidental that I was just about to ask this very
question to the forum experts.
I think this is the same sort of thing Jamie was asking about. (the
only difference in my question is that the values won't be known at
query time)

Is it possible to create a request that will return *multiple* facet
ranges - 1 for each value of a given field? (ideally, up to some
facet.limit)

For example: Let's say you query: user:* AND timestamp:[yesterday TO
now], with a facet field of 'user'.
Let's now say the faceting returns a count of 50, and there are 5
different values for 'user' - let's say user1, user2, user3, user4 and
user5 (50 things happened over the last 24 hours by 5 different
users).

Is it possible, in a single query, to get back 5 facet ranges over the
24hr period - one for each user? Or, do you simply have to do the
search, and then iterate through each value returned and date facet on
that?

Pivot faceting can give results for combinations of multiple facets,
but not ranges.

Thanks,
Peter




On Sun, Jan 15, 2012 at 3:30 PM, Lee Carroll
 wrote:
>>  Does
>> that make more sense?
>
> Ah I see.
>
> I'm not certain but take a look at pivot faceting
>
> https://issues.apache.org/jira/browse/SOLR-792
>
> cheers lee c


RE: GermanAnalyzer

2012-01-15 Thread spring
> > What is an equivalent fieldType definition in Solr 3.5?
> 
> 
>   
> 

OK, and if I would reindex, is this still the best practice config for
german text?



Re: Getting started with indexing a database

2012-01-15 Thread Rakesh Varna
Hi Mike,
   Can you try removing '  from the
nested entities? Just keep it in the top level entity.

Regards,
Rakesh Varna

On Wed, Jan 11, 2012 at 7:26 AM, Gora Mohanty  wrote:

> On Tue, Jan 10, 2012 at 7:09 AM, Mike O'Leary  wrote:
> [...]
> > My data-config.xml file looks like this:
> >
> > 
> >   >  url="jdbc:mysql://localhost:3306/bioscope" user="db_user"
> password=""/>
> >  
> > >deltaQuery="SELECT doc_id FROM bioscope.docs where
> last_modified > '${dataimporter.last_index_time}'">
> >  
> >  
>
> Your SELECT above does not include the field "type"
>
> >^^ This should be: WHERE id=='${docs.doc_id}' as 'id' is
> what
>you are selecting in this entity.
>
> Same issue for the second nested entity, i.e., replace doc_id= with id=
>
> Regards,
> Gora
>


Re: xpathentityprocessor with flattern true

2012-01-15 Thread Rakesh Varna
Try using flatten="true" in the  rather than the . Note that
it will remove all child node names, and will only concatenate the text
values of the child nodes.
example:



abc
def
ghi>/id>




will concatenate abc, def, ghi to give a single text value. Note that xpath
terminates at 

Regards,
Rakesh Varna
On Mon, Jan 9, 2012 at 8:32 AM, vrpar...@gmail.com wrote:

> am i making any mistake with xpathentityprocessor?
>
> i am using solr 1.4
>
> please help me to solve this problem?
>
>
>
> Thanks & Regards,
> Vishal Parekh
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/xpathentityprocessor-with-flattern-true-tp3637928p3645013.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Synonym configuration not working?

2012-01-15 Thread Bernd Fehling

Yes and No.
If using Synonyms funtionality out of the box you have to do it at index time.

But if using it at query time, like we do, you have to do some programming.
We have connected a thesaurus which is actually using synonyms functionality at 
query time.
There are some pitfalls to take care of.

Bernd

Am 15.01.2012 07:07, schrieb Michael Lissner:

Just replying for others in the future. The answer to this is to do synonyms at 
index time, not at query time.

Mike

On Fri 06 Jan 2012 02:35:23 PM PST, Michael Lissner wrote:

I'm trying to set up some basic synonyms. The one I've been working on is:

us, usa, united states

My understanding is that adding that to the synonym file will allow users to 
search for US, and get back documents containing usa or united
states. Ditto for if a user puts in usa or united states.

Unfortunately, with this in place, when I do a search, I get the results for 
items that contain all three of the words - it's doing an AND of
the synonyms rather than an OR.

If I turn on debugging, this is indeed what I see (plus some stemming):
(+DisjunctionMaxQuery(((westCite:us westCite:usa westCite:unit) | (text:us 
text:usa text:unit) | (docketNumber:us docketNumber:usa
docketNumber:unit) | ((status:us status:usa status:unit)^1.25) | (court:us 
court:usa court:unit) | (lexisCite:us lexisCite:usa lexisCite:unit)
| ((caseNumber:us caseNumber:usa caseNumber:unit)^1.25) | ((caseName:us 
caseName:usa caseName:unit)^1.5/no_coord

Am I doing something wrong to cause this? My defaultOperator is set to AND, but 
I'd expect the synonym filter to understand that.

Any help?

Thanks,

Mike