Reverse query?

2015-10-02 Thread remi tassing
Hi,
I have medium-low experience on Solr and I have a question I couldn't quite
solve yet.

Typically we have quite short query strings (a couple of words) and the
search is done through a set of bigger documents. What if the logic is
turned a little bit around. I have a document and I need to find out what
strings appear in the document. A string here could be a person name
(including space for example) or a location...which are indexed in Solr.

A concrete example, we take this text from wikipedia (Mad Max):
"*Mad Max is a 1979 Australian dystopian action film directed by George
Miller .
Written by Miller and James McCausland from a story by Miller and producer
Byron Kennedy , it tells a
story of societal breakdown
, murder, and vengeance
. The film, starring the
then-little-known Mel Gibson ,
was released internationally in 1980. It became a top-grossing Australian
film, while holding the record in the Guinness Book of Records
 for decades as the
most profitable film ever created,[1]
 and has
been credited for further opening the global market to Australian New Wave
 films.*

"

I would like it to match "Mad Max" but not "Mad" or "Max" seperately, and
"George Miller", "global market" ...

I've tried the keywordTokenizer but it didn't work. I suppose it's ok for
the index time but not query time (in this specific case)

I had a look at Luwak but it's not what I'm looking for (
http://www.flax.co.uk/blog/2013/12/06/introducing-luwak-a-library-for-high-performance-stored-queries/
)

The typical name search doesn't seem to work either,
https://dzone.com/articles/tips-name-search-solr

I was thinking this problem must have already be solved...or?

Remi


Re: Reverse query?

2015-10-02 Thread remi tassing
Hi,

@Erik: Yes I'm using the admin-ui and yes I quickly notice keywordTokenizer
couldn't work
@All: sorry for not explaining properly, I'm aware of the phrase query and
a little bit of the N-Gram.

So to simplify my problem, the documents indexed are:
id:1, content:Mad Max
id:2, content:George Miller
id:3, content:global market
id:4, content:Solr development

Now the query is the content of the wiki page at
https://en.wikipedia.org/wiki/Mad_Max_%28franchise%29

the results id:1, id:2, id:3 should be returned but not id:4. Today I'm
able to do this with something similar to grep (Aho-corasick) but the list
is growing bigger and bigger. I thought Solr/Lucene could tackle this more
efficiently and also add other capabilities like filtering ...

Maybe there is another tool more suitable for the job?

Remi


On Fri, Oct 2, 2015 at 10:07 PM, Andrea Roggerone <
andrearoggerone.o...@gmail.com> wrote:

> Hi, the phrase query format would be:
> "Mad Max"~2
> The * has been added by the mail aggregator around the chars in Bold for
> some reason. That wasn't a wildcard.
>
> On Friday, October 2, 2015, Roman Chyla  wrote:
>
> > I'd like to offer another option:
> >
> > you say you want to match long query into a document - but maybe you
> > won't know whether to pick "Mad Max" or "Max is" (not mentioning the
> > performance hit of "*mad max*" search - or is it not the case
> > anymore?). Take a look at the NGram tokenizer (say size of 2; or
> > bigger). What it does, it splits the input into overlapping segments
> > of 'X' words (words, not characters - however, characters work too -
> > just pick bigger N)
> >
> > mad max
> > max 1979
> > 1979 australian
> >
> > i'd recommend placing stopfilter before the ngram
> >
> >  - then for the long query string of "Hey Mad Max is 1979" you
> > wold search "hey mad" OR "mad max" OR "max 1979"... (perhaps the query
> > tokenizer could be convinced to the search for you automatically). And
> > voila, the more overlapping segments there, the higher the search
> > result.
> >
> > hth,
> >
> > roman
> >
> >
> >
> > On Fri, Oct 2, 2015 at 12:03 PM, Erick Erickson  > > wrote:
> > > The admin/analysis page is your friend here, find it and use it ;)
> > > Note you have to select a core on the admin UI screen before you can
> > > see the choice.
> > >
> > > Because apart from the other comments, KeywordTokenizer is a red flag.
> > > It does NOT break anything up into tokens, so if your doc contains:
> > > Mad Max is a 1979 Australian
> > > as the whole field, the _only_ match you'll ever get is if you search
> > exactly
> > > "Mad Max is a 1979 Australian"
> > > Not Mad, not mad, not Max, exactly all 6 words separated by exactly one
> > space.
> > >
> > > Andrea's suggestion is the one you want, but be sure you use one of
> > > the tokenizing analysis chains, perhaps start with text_en (in the
> > > stock distro). Be sure to completely remove your node/data directory
> > > (as in rm -rf data) after you make the change.
> > >
> > > And really, explore the admin/analysis page; it's where a LOT of these
> > > kinds of problems find solutions ;)
> > >
> > > Best,
> > > Erick
> > >
> > > On Fri, Oct 2, 2015 at 7:57 AM, Ravi Solr  > > wrote:
> > >> Hello Remi,
> > >> Iam assuming the field where you store the data is
> analyzed.
> > >> The field definition might help us answer your question better. If you
> > are
> > >> using edismax handler for your search requests, I believe you can
> > achieve
> > >> you goal by setting set your "mm" to 100%, phrase slop "ps" and query
> > slop
> > >> "qs" parameters to zero. I think that will force exact matches.
> > >>
> > >> Thanks
> > >>
> > >> Ravi Kiran Bhaskar
> > >>
> > >> On Fri, Oct 2, 2015 at 9:48 AM, Andrea Roggerone <
> > >> andrearoggerone.o...@gmail.com > wrote:
> > >>
> > >>> Hi Remy,
> > >>> The question is not really clear, could you explain a little bit
> better
> > >>> what you need? Reading your email I understand that you want to get
> > >>> documents containing all the search terms typed. For instance if you
> > search
> > >>> for "

Re: Reverse query?

2015-10-03 Thread remi tassing
@Jack: After reading the documentation, I think perlocator is what I'm
after. The filtering possibility is extremely appealing as well. I'll have
a closer look and experiment a bit.

@Erik: Yes that's right, notification is not really needed in my case
though. It should be doable as you said…perlocator could be a good
reference.

Thank you all guys!
On Oct 3, 2015 6:08 PM, "Erick Erickson"  wrote:

> OK, finally the light dawns. You're doing something akin to "alerts".
> that is, store a bunch of queries, then when a new document comes
> in find out if any of the queries would match the doc and send
> out alerts to each user who has entered a query like that. Your
> situation may not be doing exactly that, but some kind of alerting
> mechanism would work, right?
>
> There are several approaches, Googling  "solr alerts" will
> turn up several. Lucidworks, Flax and others have built some
> tools (some commercial) for this ability.
>
> One way to approach this is to store the queries "somewhere",
> perhaps in a DB, perhaps in their own Solr collection, and write
> a custom component that takes an incoming document and puts
> it in a MemoryIndex, runs the queries against it and sends
> the alerts. This requires some lower-level programming, but is
> quite do-able.
>
> Best,
> Erick
>
> On Sat, Oct 3, 2015 at 7:02 AM, Gili Nachum  wrote:
> > Check if MLT (more like this) could fit your requirements.
> > https://wiki.apache.org/solr/MoreLikeThis
> >
> > If your requirements are more specific I think your client program should
> > tokenize the target document then construct one or more queries like:
> > "token token2" OR "token2 token3" OR ...
> >
> > I'm not sure how you get the list of tokens , perhaps using the same api
> > that the analyze admin page uses (haven't  checked )
> > On Oct 3, 2015 09:32, "remi tassing"  wrote:
> >
> >> Hi,
> >>
> >> @Erik: Yes I'm using the admin-ui and yes I quickly notice
> keywordTokenizer
> >> couldn't work
> >> @All: sorry for not explaining properly, I'm aware of the phrase query
> and
> >> a little bit of the N-Gram.
> >>
> >> So to simplify my problem, the documents indexed are:
> >> id:1, content:Mad Max
> >> id:2, content:George Miller
> >> id:3, content:global market
> >> id:4, content:Solr development
> >>
> >> Now the query is the content of the wiki page at
> >> https://en.wikipedia.org/wiki/Mad_Max_%28franchise%29
> >>
> >> the results id:1, id:2, id:3 should be returned but not id:4. Today I'm
> >> able to do this with something similar to grep (Aho-corasick) but the
> list
> >> is growing bigger and bigger. I thought Solr/Lucene could tackle this
> more
> >> efficiently and also add other capabilities like filtering ...
> >>
> >> Maybe there is another tool more suitable for the job?
> >>
> >> Remi
> >>
> >>
> >> On Fri, Oct 2, 2015 at 10:07 PM, Andrea Roggerone <
> >> andrearoggerone.o...@gmail.com> wrote:
> >>
> >> > Hi, the phrase query format would be:
> >> > "Mad Max"~2
> >> > The * has been added by the mail aggregator around the chars in Bold
> for
> >> > some reason. That wasn't a wildcard.
> >> >
> >> > On Friday, October 2, 2015, Roman Chyla 
> wrote:
> >> >
> >> > > I'd like to offer another option:
> >> > >
> >> > > you say you want to match long query into a document - but maybe you
> >> > > won't know whether to pick "Mad Max" or "Max is" (not mentioning the
> >> > > performance hit of "*mad max*" search - or is it not the case
> >> > > anymore?). Take a look at the NGram tokenizer (say size of 2; or
> >> > > bigger). What it does, it splits the input into overlapping segments
> >> > > of 'X' words (words, not characters - however, characters work too -
> >> > > just pick bigger N)
> >> > >
> >> > > mad max
> >> > > max 1979
> >> > > 1979 australian
> >> > >
> >> > > i'd recommend placing stopfilter before the ngram
> >> > >
> >> > >  - then for the long query string of "Hey Mad Max is 1979" you
> >> > > wold search "hey mad" OR "mad max" OR "max 1979"... (p

Re: Reverse query?

2015-10-05 Thread remi tassing
Hi Alan,

I became aware of Luwak a few months ago and I'm planning on using it in
the future. The only reason I couldn’t use it for my specific scenario was
the fact that I needed the possibility to filter on the fly and not
necessarily include filtering while building the query index. Apparently
from the description, the percolator API in Elasticsearch supports this.

I might be wrong, so I'll have to experiment a little bit first.

Remi

On Mon, Oct 5, 2015 at 1:58 PM, Alan Woodward  wrote:

> Hi Remi,
>
> Your use-case is more-or-less exactly what I wrote luwak for:
> https://github.com/flaxsearch/luwak.  You register your queries with a
> Monitor object, and then match documents against them.  The monitor
> analyzes the documents that are passed in and tries to filter out queries
> that it can detect won't match ahead of time, which is particularly useful
> if some of your queries are complex and expensive to run.
>
> We've found that luwak performs better than the percolator out of the box (
> http://www.flax.co.uk/blog/2015/07/27/a-performance-comparison-of-streamed-search-implementations/),
> but depending on how many queries you have and how complex they are you may
> find that the percolator is a lot easier to set up, as it comes bundled as
> part of elasticsearch while luwak is just a Java library, and will require
> some coding to get it up and running.
>
> Alan Woodward
> www.flax.co.uk
>
>
> On 3 Oct 2015, at 23:05, remi tassing wrote:
>
> > @Jack: After reading the documentation, I think perlocator is what I'm
> > after. The filtering possibility is extremely appealing as well. I'll
> have
> > a closer look and experiment a bit.
> >
> > @Erik: Yes that's right, notification is not really needed in my case
> > though. It should be doable as you said…perlocator could be a good
> > reference.
> >
> > Thank you all guys!
> > On Oct 3, 2015 6:08 PM, "Erick Erickson" 
> wrote:
> >
> >> OK, finally the light dawns. You're doing something akin to "alerts".
> >> that is, store a bunch of queries, then when a new document comes
> >> in find out if any of the queries would match the doc and send
> >> out alerts to each user who has entered a query like that. Your
> >> situation may not be doing exactly that, but some kind of alerting
> >> mechanism would work, right?
> >>
> >> There are several approaches, Googling  "solr alerts" will
> >> turn up several. Lucidworks, Flax and others have built some
> >> tools (some commercial) for this ability.
> >>
> >> One way to approach this is to store the queries "somewhere",
> >> perhaps in a DB, perhaps in their own Solr collection, and write
> >> a custom component that takes an incoming document and puts
> >> it in a MemoryIndex, runs the queries against it and sends
> >> the alerts. This requires some lower-level programming, but is
> >> quite do-able.
> >>
> >> Best,
> >> Erick
> >>
> >> On Sat, Oct 3, 2015 at 7:02 AM, Gili Nachum 
> wrote:
> >>> Check if MLT (more like this) could fit your requirements.
> >>> https://wiki.apache.org/solr/MoreLikeThis
> >>>
> >>> If your requirements are more specific I think your client program
> should
> >>> tokenize the target document then construct one or more queries like:
> >>> "token token2" OR "token2 token3" OR ...
> >>>
> >>> I'm not sure how you get the list of tokens , perhaps using the same
> api
> >>> that the analyze admin page uses (haven't  checked )
> >>> On Oct 3, 2015 09:32, "remi tassing"  wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> @Erik: Yes I'm using the admin-ui and yes I quickly notice
> >> keywordTokenizer
> >>>> couldn't work
> >>>> @All: sorry for not explaining properly, I'm aware of the phrase query
> >> and
> >>>> a little bit of the N-Gram.
> >>>>
> >>>> So to simplify my problem, the documents indexed are:
> >>>> id:1, content:Mad Max
> >>>> id:2, content:George Miller
> >>>> id:3, content:global market
> >>>> id:4, content:Solr development
> >>>>
> >>>> Now the query is the content of the wiki page at
> >>>> https://en.wikipedia.org/wiki/Mad_Max_%28franchise%29
> >>>>
> >>>> the results id:1, id:2, id:3 should be returned bu

Result merging takes too long

2014-03-11 Thread remi tassing
Hi,

I've just setup a SolrCloud with Tomcat. 5 Shards with one replication each
and total 10million docs (evenly distributed).

I've noticed the query response time is faster than using one single node
but still not as fast as I expected.

After turning debugQuery on, I noticed the query time is different to the
value returned in the debug explanation (see some excerpt below). More
importantly, while making a query to one, and only one, shard then the
result is consistent. It appears the server spends most of its time doing
result aggregation (merging).

After searching on Google in vain I didn't find anything concrete except
that the problem could be in 'SearchComponent'.

Could you point me in the right direction (e.g. configuration...)?

Thanks!

Remi

Solr Cloud result:



0

3471



on

project development agile





...

...





508.0



8.0



8.0





0.0





0.0





0.0





0.0





0.0







499.0



195.0





0.0





0.0





228.0





0.0





76.0








Re: Result merging takes too long

2014-03-13 Thread remi tassing
Hi Erick,

I've used the fl=id parameter to avoid retrieving the actual documents
(step <4> in your mail) but the problem still exists.
Any ideas on how to find the merging time(step <3>)?

Remi


On Tue, Mar 11, 2014 at 7:29 PM, Erick Erickson wrote:

> In SolrCloud there are a couple of round trips
> that _may_ be what you're seeing.
>
> First, though, the QTime is the time spent
> querying, it does NOT include assembling
> the documents from disk for return etc., so
> bear that in mind
>
> But here's the sequence as I understand it
> from the receiving node's viewpoint.
> 1> send the query out to one replica for
> each shard
> 2> get the top N doc IDs and scores (
> or whatever sorting criteria) from each
> shard.
> 3> Merge the lists and select the top N
> to return
> 4> request the actual documents for
> the top N list from each of the shards
> 5> return the list.
>
> So as you can see, there's an extra
> round trip to each shard to get the
> full document. Perhaps this is what
> you're seeing? <4> seems like it
> might be what you're seeing, I don't
> think it's counted in QTime.
>
> HTH
> Erick
>
> On Tue, Mar 11, 2014 at 3:17 AM, remi tassing 
> wrote:
> > Hi,
> >
> > I've just setup a SolrCloud with Tomcat. 5 Shards with one replication
> each
> > and total 10million docs (evenly distributed).
> >
> > I've noticed the query response time is faster than using one single node
> > but still not as fast as I expected.
> >
> > After turning debugQuery on, I noticed the query time is different to the
> > value returned in the debug explanation (see some excerpt below). More
> > importantly, while making a query to one, and only one, shard then the
> > result is consistent. It appears the server spends most of its time doing
> > result aggregation (merging).
> >
> > After searching on Google in vain I didn't find anything concrete except
> > that the problem could be in 'SearchComponent'.
> >
> > Could you point me in the right direction (e.g. configuration...)?
> >
> > Thanks!
> >
> > Remi
> >
> > Solr Cloud result:
> >
> > 
> >
> > 0
> >
> > 3471
> >
> > 
> >
> > on
> >
> > project development agile
> >
> > 
> >
> > 
> >
> >  > maxScore="0.17022902">...
> >
> > ...
> >
> >
> >
> > 
> >
> > 508.0
> >
> > 
> >
> > 8.0
> >
> > 
> >
> > 8.0
> >
> > 
> >
> > 
> >
> > 0.0
> >
> > 
> >
> > 
> >
> > 0.0
> >
> > 
> >
> > 
> >
> > 0.0
> >
> > 
> >
> > 
> >
> > 0.0
> >
> > 
> >
> > 
> >
> > 0.0
> >
> > 
> >
> > 
> >
> > 
> >
> > 499.0
> >
> > 
> >
> > 195.0
> >
> > 
> >
> > 
> >
> > 0.0
> >
> > 
> >
> > 
> >
> > 0.0
> >
> > 
> >
> > 
> >
> > 228.0
> >
> > 
> >
> > 
> >
> > 0.0
> >
> > 
> >
> > 
> >
> > 76.0
> >
> > 
> >
> > 
> >
> > 
>


Re: Solr Cloud Segments and Merging Issues

2014-03-13 Thread remi tassing
Hi Varun,

I would just like to say that I have the same two problems you've mentioned
and I couldn't figure out a way to solve them.

For the 2nd I've posted a question a couple of days ago, title: "Result
merging takes too long"

Remi


On Thu, Mar 13, 2014 at 3:44 PM, Varun Rajput  wrote:

> I am using Solr 4.6.0 in cloud mode. The setup is of 4 shards, 1 on each
> machine with a zookeeper quorum running on 3 other machines. The index size
> on each shard is about 15GB. I noticed that the number of segments in
> second shard was 42 and in the remaining shards was between 25-30.
>
> I am basically trying to get the number of segments down to a reasonable
> size like 4 or 5 in order to improve the search time. We do have some
> documents indexed everyday, so we don't want to do an optimize every day.
>
> The merge factor with the TierMergePolicy is only the number of segments
> per tier. Assuming there were 5 tiers (mergeFactor of 10) in the second
> shard, I tried clearing the index, reducing the mergeFactor and re-indexing
> the same data in the same manner, multiple times, but I don't see a pattern
> of reduction in number of segments.
>
> No mergeFactor set  => 42 segments
> mergeFactor=5  =>   22 segments
> mergeFactor=2  =>   22 segments
>
> Below is the simple configuration, as specified in the documentation, I am
> using for merging:
>
> 
>
>   2
>
>   2
>
> 
>
> 
>
> What is the best way in which I can use merging to restrict the number of
> segments being formed?
>
> Also, we are moving from Solr 1.4 (Master-Slave) to Solr 4.6.0 Cloud and
> see a great increase in response time from about 18ms to 150ms. Is this a
> known issue? Is there no way to reduce the response time? In the MBeans,
> the individual cores show the /select handler attributes having search
> times around 8ms. What is it that causes the overall response time to
> increase so much?
>
> -Varun
>


Re: Result merging takes too long

2014-03-16 Thread remi tassing
>So I have to ask what the end goal is here.
In our case, the purpose of sharding was/is to speed up the process.
We've noticed that as the index size was growing, response speed kept going
down so we decided to split the index across 5 machines.

>Are your response times really in need of improvement or is this more
trying to understand the process?
Our response time went from 1second to 5+ seconds, so we thought we could
definitely do better with Solr(Cloud).

'start' and 'rows' are generally set to the default values (i.e., 0 and 10
respectively).

Any clues how to conduct further investigations?


On Sun, Mar 16, 2014 at 7:29 AM, Erick Erickson wrote:

> I wouldn't expect the merge times to be significant
> at all, _assuming_ you're not doing something like
> setting a very high &start= parameter or returning
> a whole of rows.
>
> Now, it may be that you're sharding with too small
> a document set to really notice a difference.
> Sharding isn't really about speeding up responses,
> as it is being able to handle very large indexes.
>
> So I have to ask what the end goal is here. Are
> your response times really in need of improvement
> or is this more trying to understand the process?
>
> Best,
> Erick
>
> On Thu, Mar 13, 2014 at 1:19 AM, remi tassing 
> wrote:
> > Hi Erick,
> >
> > I've used the fl=id parameter to avoid retrieving the actual documents
> > (step <4> in your mail) but the problem still exists.
> > Any ideas on how to find the merging time(step <3>)?
> >
> > Remi
> >
> >
> > On Tue, Mar 11, 2014 at 7:29 PM, Erick Erickson  >wrote:
> >
> >> In SolrCloud there are a couple of round trips
> >> that _may_ be what you're seeing.
> >>
> >> First, though, the QTime is the time spent
> >> querying, it does NOT include assembling
> >> the documents from disk for return etc., so
> >> bear that in mind
> >>
> >> But here's the sequence as I understand it
> >> from the receiving node's viewpoint.
> >> 1> send the query out to one replica for
> >> each shard
> >> 2> get the top N doc IDs and scores (
> >> or whatever sorting criteria) from each
> >> shard.
> >> 3> Merge the lists and select the top N
> >> to return
> >> 4> request the actual documents for
> >> the top N list from each of the shards
> >> 5> return the list.
> >>
> >> So as you can see, there's an extra
> >> round trip to each shard to get the
> >> full document. Perhaps this is what
> >> you're seeing? <4> seems like it
> >> might be what you're seeing, I don't
> >> think it's counted in QTime.
> >>
> >> HTH
> >> Erick
> >>
> >> On Tue, Mar 11, 2014 at 3:17 AM, remi tassing 
> >> wrote:
> >> > Hi,
> >> >
> >> > I've just setup a SolrCloud with Tomcat. 5 Shards with one replication
> >> each
> >> > and total 10million docs (evenly distributed).
> >> >
> >> > I've noticed the query response time is faster than using one single
> node
> >> > but still not as fast as I expected.
> >> >
> >> > After turning debugQuery on, I noticed the query time is different to
> the
> >> > value returned in the debug explanation (see some excerpt below). More
> >> > importantly, while making a query to one, and only one, shard then the
> >> > result is consistent. It appears the server spends most of its time
> doing
> >> > result aggregation (merging).
> >> >
> >> > After searching on Google in vain I didn't find anything concrete
> except
> >> > that the problem could be in 'SearchComponent'.
> >> >
> >> > Could you point me in the right direction (e.g. configuration...)?
> >> >
> >> > Thanks!
> >> >
> >> > Remi
> >> >
> >> > Solr Cloud result:
> >> >
> >> > 
> >> >
> >> > 0
> >> >
> >> > 3471
> >> >
> >> > 
> >> >
> >> > on
> >> >
> >> > project development agile
> >> >
> >> > 
> >> >
> >> > 
> >> >
> >> >  >> > maxScore="0.17022902">...
> >> >
> >> > ...
> >> >
> >> >
> >> >
> >> > 
> >> >
> >> > 508.0
> >> >
> >> > 
> >> >
> >> > 8.0
> >> >
> >> > 
> >> >
> >> > 8.0
> >> >
> >> > 
> >> >
> >> > 
> >> >
> >> > 0.0
> >> >
> >> > 
> >> >
> >> > 
> >> >
> >> > 0.0
> >> >
> >> > 
> >> >
> >> > 
> >> >
> >> > 0.0
> >> >
> >> > 
> >> >
> >> > 
> >> >
> >> > 0.0
> >> >
> >> > 
> >> >
> >> > 
> >> >
> >> > 0.0
> >> >
> >> > 
> >> >
> >> > 
> >> >
> >> > 
> >> >
> >> > 499.0
> >> >
> >> > 
> >> >
> >> > 195.0
> >> >
> >> > 
> >> >
> >> > 
> >> >
> >> > 0.0
> >> >
> >> > 
> >> >
> >> > 
> >> >
> >> > 0.0
> >> >
> >> > 
> >> >
> >> > 
> >> >
> >> > 228.0
> >> >
> >> > 
> >> >
> >> > 
> >> >
> >> > 0.0
> >> >
> >> > 
> >> >
> >> > 
> >> >
> >> > 76.0
> >> >
> >> > 
> >> >
> >> > 
> >> >
> >> > 
> >>
>


Re: Result merging takes too long

2014-04-13 Thread remi tassing
Hi,

After looking at SearchHandler.java I believe we might have a bug on debug
information shown for distributed queries.

>@Erick: So as you can see, there's an extra round trip to each shard to
get the full document. Perhaps this is what you're seeing? <4> seems like
it might be what you're seeing, I don't think it's counted in QTime.

As Erick mentioned, the debug info is probably for one of the steps for
only one node and not the 'aggregrated' time (QTime represents the total
RTT so it includes everything I suppose).

Should a JIRA case be registered for this?

Remi


On Tue, Mar 18, 2014 at 12:41 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> That's great Jeff! Thanks for sharing your experience. SOLR-5768 will
> make it even better.
>
> https://issues.apache.org/jira/browse/SOLR-5768
>
> On Tue, Mar 18, 2014 at 3:35 AM, Jeff Wartes 
> wrote:
> >
> > This is highly anecdotal, but I tried SOLR-1880 with 4.7 for some tests I
> > was running, and saw almost a 30% improvement in latency. If you¹re only
> > doing document selection, it¹s definitely worth having.
> >
> > I¹m reasonably certain that the patch would work in 4.6 too, but the test
> > file relies on some test infrastructure changes in 4.7, so I couldn¹t try
> > that without re-writing the test.
> >
> >
> >
> > On 3/16/14, 7:31 AM, "Shalin Shekhar Mangar" 
> > wrote:
> >
> >>On Thu, Mar 13, 2014 at 1:49 PM, remi tassing 
> >>wrote:
> >>>
> >>> I've used the fl=id parameter to avoid retrieving the actual documents
> >>> (step <4> in your mail) but the problem still exists.
> >>> Any ideas on how to find the merging time(step <3>)?
> >>
> >>Actually that doesn't skip steps #4 and #5. That optimization will be
> >>available in Solr 4.8 (the next release). See
> >>https://issues.apache.org/jira/browse/SOLR-1880
> >>
> >>--
> >>Regards,
> >>Shalin Shekhar Mangar.
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: multi word search for elevator (QueryElevationComponent) not working

2014-04-16 Thread remi tassing
Hi Niranjan,

you should setup it up so the query matches the elevation criteria, for
example elevating "ipod" and "apple ipod" as well or changing the "string" accordingly

Remi


On Mon, Apr 14, 2014 at 9:19 PM, Niranjan  wrote:

> Hi All,
>
> I have implemented a sponsor search where I have to elevate a particular
> document for a specific query text.
>
> To achieve this I have made the following changes (solr version:4.7.1):
>
> 1) Changes in solrConfig.xml
>
> 
>
>   string
>   elevate.xml
> 
>
> 
>   
> explicit
>   
>   
> elevator
>   
> 
>
> 2)added the required doc id in elevate.xml
>
> 
> 
> 
>
>
>  
>
>
> I am able to fetch the proper elevated result for query text "ipod" but the
> problem is that when I am trying
> to search for "apple ipod" (multi word) the documents are not getting
> elevated,although my query contains the term "ipod".
>
> What is the proper way to configure the ElevationComponent so that it works
> for both singleword query as "ipod" as well as multi word query "apple
> ipod"?
>
> Thanks in advance!!!
> Niranjan
>
>
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/multi-word-search-for-elevator-QueryElevationComponent-not-working-tp4131016.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Default Search UI not working

2011-12-19 Thread remi tassing
Hello guys,
the default search UI doesn't work for me. http://localhost:8983/solr/browse 
gives me an HTTP 404 error.
I'm using Solr-1.4. Any idea how to fix this?
Remi

In-web search

2011-12-20 Thread remi tassing
Hi,
What is the query syntax for Solr to search within a specific site?
For example in google you can search like this: "Solr site:apache.org"
Remi

search within specific domain

2012-01-13 Thread remi tassing
Hello all,

I think it's possible with Solr to search within a specific domain (like
with google). How is done?

Ref:
http://support.google.com/websearch/bin/answer.py?hl=en&answer=136861&rd=1
*Search within a specific website (site:)*
Google allows you to specify that your search results must come from a
given website. For example, the query [ iraq site:nytimes.com ] will return
pages about Iraq but only from nytimes.com. The simpler queries [ iraq
nytimes.com ] or [ iraq New York Times ] will usually be just as good,
though they might return results from other sites that mention the New York
Times. You can also specify a whole class of sites, for example [ iraq
site:.gov ] will return results only from a .gov domain and [ iraq site:.iq
 ] will return results only from Iraqi sites.


"index-time" over boosted

2012-01-18 Thread remi tassing
Hello all,

I've come accros a problem where newly indexed pages almost always come
first even when the term frequency is relatively slow.

I read the posts below on "fieldNorm" and "omitNorms" but setting
"omitNorms=true" doesn't change anything for me on the calculation of
fieldNorm.

e.g.:
0.12333163 = (MATCH) weight(content:"mobil broadband" in 1004), product of:
1.0 = queryWeight(content:"mobil broadband"), product of: 6.3145795 =
idf(content: mobil=4922 broadband=2290) 0.15836367 = queryNorm 0.12333163 =
fieldWeight(content:"mobil broadband" in 1004), product of: 1.0 =
tf(phraseFreq=1.0) 6.3145795 = idf(content: mobil=4922 broadband=2290)
0.01953125 = fieldNorm(field=content, doc=1004)

These values are the same regardless of omitNorms's value.

Any idea what might be the problem?

[1]
http://lucene.472066.n3.nabble.com/QueryNorm-and-FieldNorm-td1992964.html
[2]
http://lucene.472066.n3.nabble.com/Question-about-fieldNorms-td504500.html


Re: "index-time" over boosted

2012-01-18 Thread remi tassing
Hi,

just a background on my setup. I'm crawling with Nutch-1.2, I used Solr-1.4
and Solr-3.5, with the same result. Solr is still using the default
settings.

I found this problem just by accident. I queried "mobile broadband", page
A, has 2 occurences and scores higher than page B that has 19 occurences. I
found it weird and that's why I started investigating.

The debug results are given below and you can see that queryWeight, idf
and queryNorm are the same, tf is higher, as expected, in B but what makes
the difference is clearly fieldNorm.

A: 0.010779975 = (MATCH) weight(content:"mobil broadband" in 18730),
product of: 1.0 = queryWeight(content:"mobil broadband"), product of:
6.2444286 = idf(content: mobil=4922 broadband=2290) 0.16014275 = queryNorm
0.010779975 = fieldWeight(content:"mobil broadband" in 18730), product of:
1.4142135 = tf(phraseFreq=2.0) 6.2444286 = idf(content: mobil=4922
broadband=2290) 0.0012207031 = fieldNorm(field=content, doc=18730)

B: 8.5223187E-4 = (MATCH) weight(content:"mobil broadband" in 14391),
product of: 1.0 = queryWeight(content:"mobil broadband"), product of:
6.2444286 = idf(content: mobil=4922 broadband=2290) 0.16014275 = queryNorm
8.5223187E-4 = fieldWeight(content:"mobil broadband" in 14391), product of:
4.472136 = tf(phraseFreq=20.0) 6.2444286 = idf(content: mobil=4922
broadband=2290) 3.0517578E-5 = fieldNorm(field=content, doc=14391)

Remi

On Wed, Jan 18, 2012 at 8:52 PM, Jan Høydahl  wrote:

> > I've come accros a problem where newly indexed pages almost always come
> > first even when the term frequency is relatively slow.
>
> There is no inherent index-time boost, so this must be something else.
> Can you give us an example of a query? Which query parser do you use?
>
> > I read the posts below on "fieldNorm" and "omitNorms" but setting
> > "omitNorms=true" doesn't change anything for me on the calculation of
> > fieldNorm.
>
> Are you sure you have spelled omitNorms="true" correctly, then restarted
> Solr (to refresh config)? The effect of Norms on your score will be that
> shorter fields score higher than long fields.
>
> Perhaps you instead can try to tell us your use-case. What kind of raning
> are you trying to achieve? Then we can help suggest how to get there.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com


Re: "index-time" over boosted

2012-01-19 Thread remi tassing
Hello Jan,

My schema wasn't changed from the release 3.5.0. The content can be seen
below:
































































id
content



Remi

On Thu, Jan 19, 2012 at 1:28 PM, Jan Høydahl  wrote:

> Hi,
>
> Can you paste exactly both  and  definitions from your
> schema? omitNorms="true" should kill norms.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 19. jan. 2012, at 08:18, remi tassing wrote:
>
> > Hi,
> >
> > just a background on my setup. I'm crawling with Nutch-1.2, I used
> Solr-1.4
> > and Solr-3.5, with the same result. Solr is still using the default
> > settings.
> >
> > I found this problem just by accident. I queried "mobile broadband", page
> > A, has 2 occurences and scores higher than page B that has 19
> occurences. I
> > found it weird and that's why I started investigating.
> >
> > The debug results are given below and you can see that queryWeight, idf
> > and queryNorm are the same, tf is higher, as expected, in B but what
> makes
> > the difference is clearly fieldNorm.
> >
> > A: 0.010779975 = (MATCH) weight(content:"mobil broadband" in 18730),
> > product of: 1.0 = queryWeight(content:"mobil broadband"), product of:
> > 6.2444286 = idf(content: mobil=4922 broadband=2290) 0.16014275 =
> queryNorm
> > 0.010779975 = fieldWeight(content:"mobil broadband" in 18730), product
> of:
> > 1.4142135 = tf(phraseFreq=2.0) 6.2444286 = idf(content: mobil=4922
> > broadband=2290) 0.0012207031 = fieldNorm(field=content, doc=18730)
> >
> > B: 8.5223187E-4 = (MATCH) weight(content:"mobil broadband" in 14391),
> > product of: 1.0 = queryWeight(content:"mobil broadband"), product of:
> > 6.2444286 = idf(content: mobil=4922 broadband=2290) 0.16014275 =
> queryNorm
> > 8.5223187E-4 = fieldWeight(content:"mobil broadband" in 14391), product
> of:
> > 4.472136 = tf(phraseFreq=20.0) 6.2444286 = idf(content: mobil=4922
> > broadband=2290) 3.0517578E-5 = fieldNorm(field=content, doc=14391)
> >
> > Remi
> >
> > On Wed, Jan 18, 2012 at 8:52 PM, Jan Høydahl 
> wrote:
> >
> >>> I've come accros a problem where newly indexed pages almost always come
> >>> first even when the term frequency is relatively slow.
> >>
> >> There is no inherent index-time boost, so this must be something else.
> >> Can you give us an example of a query? Which query parser do you use?
> >>
> >>> I read the posts below on "fieldNorm" and "omitNorms" but setting
> >>> "omitNorms=true" doesn't change anything for me on the calculation of
> >>> fieldNorm.
> >>
> >> Are you sure you have spelled omitNorms="true" correctly, then restarted
> >> Solr (to refresh config)? The effect of Norms on your score will be that
> >> shorter fields score higher than long fields.
> >>
> >> Perhaps you instead can try to tell us your use-case. What kind of
> raning
> >> are you trying to achieve? Then we can help suggest how to get there.
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com
> >> Solr Training - www.solrtraining.com
>
>


Just can't get Solritas to work, help!

2012-01-19 Thread remi tassing
Hi,

I tried everything I could, changed version but nada!

Is there a working tutorial on how to make Nutch, Solr and Solritas work?

Remi


Re: Just can't get Solritas to work, help!

2012-01-19 Thread remi tassing
I can get the error:
HTTP ERROR 400

Problem accessing /solr/browse. Reason:

undefined field cat

--
*Powered by Jetty://*

On Thu, Jan 19, 2012 at 2:44 PM, remi tassing  wrote:

> Hi,
>
> I tried everything I could, changed version but nada!
>
> Is there a working tutorial on how to make Nutch, Solr and Solritas work?
>
> Remi
>


Re: Just can't get Solritas to work, help!

2012-01-19 Thread remi tassing
I think I get your point.

Is there any solrconfig.xml sample that works with nutch in a default
configuration?

Just something to start play with

Remi

On Thu, Jan 19, 2012 at 3:02 PM, Erik Hatcher wrote:

> /browse is defined solrconfig.xml.  Its details need adjusting for
> datasets other than the example data that ships with Solr.  Templates may
> also need adjusting, but does handle arbitrary facet fields automatically.
>
>   Erik
>
> On Jan 19, 2012, at 7:56, remi tassing  wrote:
>
> > I can get the error:
> > HTTP ERROR 400
> >
> > Problem accessing /solr/browse. Reason:
> >
> >undefined field cat
> >
> > --
> > *Powered by Jetty://*
> >
> > On Thu, Jan 19, 2012 at 2:44 PM, remi tassing 
> wrote:
> >
> >> Hi,
> >>
> >> I tried everything I could, changed version but nada!
> >>
> >> Is there a working tutorial on how to make Nutch, Solr and Solritas
> work?
> >>
> >> Remi
> >>
>


Re: Just can't get Solritas to work, help!

2012-01-19 Thread remi tassing
Hey Nick,

could you plz create a new thread?

Remi

On Thu, Jan 19, 2012 at 3:35 PM, Nicholas Fellows wrote:

> Heya,
>
>  Question for you guys, Im trying to use the solr "analysis.jsp" tool
> to debug a solr query.
>  I cant work out how to input sample data for the Field Value (Index)
> box when the data is multiValued.
>
> I was wondering if you could explain how to do this or point me to the
> documentation where this is explained
> (as i've not found anything helpful).
>
> I am of course assuming that the analyser supports multiValued data.
>
> Help Greatfully Appreciated!
>
> Cheers
>
> N ..
>
> --
> Nick Fellows
> DJdownload.com
>


Re: Just can't get Solritas to work, help!

2012-01-20 Thread remi tassing
So erase my solr folder and started from scratch.

>From the example folder I "java -jar start.jar" but there was a
solrconfig.xml missing. I copied this file from Solr-3.4.0 to my Solr-3.5.0
folder.

Now http://localhost:8983/solr/admin works but
http://localhost:8983/solr/browse gives me this result instead of a GUI:

  
 - <http://localhost:8983/solr/browse#> 
 - <http://localhost:8983/solr/browse#> 
   0
   1297
   
  
   
 - <http://localhost:8983/solr/browse#> 
 - <http://localhost:8983/solr/browse#> 
   0
   0
  
 - <http://localhost:8983/solr/browse#> 
   
   
  
   
 - <http://localhost:8983/solr/browse#> 
 - <http://localhost:8983/solr/browse#> 
   
   50.0
   0.0
   600.0
   0
  
 - <http://localhost:8983/solr/browse#> 
   
   3
   0
   12
   0
  
 - <http://localhost:8983/solr/browse#> 
   
   +1YEAR
   2002-01-01T00:00:00Z
   2013-01-01T00:00:00Z
   0
   0
  
  
  
  

Plz help, this is exhausting

Remi

On Thu, Jan 19, 2012 at 3:02 PM, Erik Hatcher wrote:

> /browse is defined solrconfig.xml.  Its details need adjusting for
> datasets other than the example data that ships with Solr.  Templates may
> also need adjusting, but does handle arbitrary facet fields automatically.
>
>   Erik
>
> On Jan 19, 2012, at 7:56, remi tassing  wrote:
>
> > I can get the error:
> > HTTP ERROR 400
> >
> > Problem accessing /solr/browse. Reason:
> >
> >    undefined field cat
> >
> > --
> > *Powered by Jetty://*
> >
> > On Thu, Jan 19, 2012 at 2:44 PM, remi tassing 
> wrote:
> >
> >> Hi,
> >>
> >> I tried everything I could, changed version but nada!
> >>
> >> Is there a working tutorial on how to make Nutch, Solr and Solritas
> work?
> >>
> >> Remi
> >>
>


Re: Just can't get Solritas to work, help!

2012-01-20 Thread remi tassing
The tutorial works with Solr-3.4.0!

Should the tutorial be updated with newer versions?

Remi

On Friday, January 20, 2012, remi tassing  wrote:
> So erase my solr folder and started from scratch.
> From the example folder I "java -jar start.jar" but there was a
solrconfig.xml missing. I copied this file from Solr-3.4.0 to my Solr-3.5.0
folder.
> Now http://localhost:8983/solr/admin works but
http://localhost:8983/solr/browse gives me this result instead of a GUI:
>   
> - 
> - 
>   0
>   1297
>   
>   
>   
> - 
> - 
>   0
>   0
>   
> - 
>   
>   
>   
>   
> - 
> - 
>   
>   50.0
>   0.0
>   600.0
>   0
>   
> - 
>   
>   3
>   0
>   12
>   0
>   
> - 
>   
>   +1YEAR
>   2002-01-01T00:00:00Z
>   2013-01-01T00:00:00Z
>   0
>   0
>   
>   
>   
>   
> Plz help, this is exhausting
> Remi
> On Thu, Jan 19, 2012 at 3:02 PM, Erik Hatcher 
wrote:
>>
>> /browse is defined solrconfig.xml.  Its details need adjusting for
datasets other than the example data that ships with Solr.  Templates may
also need adjusting, but does handle arbitrary facet fields automatically.
>>
>>   Erik
>>
>> On Jan 19, 2012, at 7:56, remi tassing  wrote:
>>
>> > I can get the error:
>> > HTTP ERROR 400
>> >
>> > Problem accessing /solr/browse. Reason:
>> >
>> >undefined field cat
>> >
>> > --
>> > *Powered by Jetty://*
>> >
>> > On Thu, Jan 19, 2012 at 2:44 PM, remi tassing 
wrote:
>> >
>> >> Hi,
>> >>
>> >> I tried everything I could, changed version but nada!
>> >>
>> >> Is there a working tutorial on how to make Nutch, Solr and Solritas
work?
>> >>
>> >> Remi
>> >>
>
>


Re: Just can't get Solritas to work, help!

2012-01-21 Thread remi tassing
In the Solr-3.5 zip file I downloaded there was no solrconfig.xml

On Friday, January 20, 2012, Erik Hatcher  wrote:
>
> On Jan 20, 2012, at 13:23 , remi tassing wrote:
>> The tutorial works with Solr-3.4.0!
>
> It works for 3.5 too... via Jetty as prescribed by the tutorial. No?
>
>> Should the tutorial be updated with newer versions?
>
> Have you tried the instructions here?
>
>   http://www.lucidimagination.com/search/document/48b9e75fe68be4b7
>
> I'll fix 3.6 so that it doesn't have this issue (haven't done it yet, but
will ASAP), but it's purely a config issue really and can be made to work
just fine with Tomcat by a little environment or config tweak using Solr's
example configuration.  Let me know if you start from scratch and try the
suggestions mentioned in that link.
>
>Erik
>
>


Re: "index-time" over boosted

2012-01-22 Thread remi tassing
Hi,

I got wrong in beginning but putting omitNorms in the query url.

Now following your advice, I merged the schema.xml from Nutch and Solr and
made sure omitNorms was set to "true" for the content, just as you said.

Unfortunately the problem remains :-(

On Thursday, January 19, 2012, Jan Høydahl  wrote:
> Hi,
>
> The schema you pasted in your mail is NOT Solr3.5's default example
schema. Did you get it from the Nutch project?
>
> And the "omitNorms" parameter is supposed to go in the  tag in
schema.xml, and the "content" field in the example schema does not have
omitNorms="true". Try to change
>
>   
> to
>   
>
> and try again. Please note that you SHOULD customize your schema, there
is really no "default" schema in Solr (or Nutch), it's only an example or
starting point. For your search application to work well you will have to
invest some time in designing a schema, working with your queries, perhaps
exploring DisMax query parser etc etc.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 19. jan. 2012, at 13:01, remi tassing wrote:
>
>> Hello Jan,
>>
>> My schema wasn't changed from the release 3.5.0. The content can be seen
>> below:
>>
>> 
>>
>>>sortMissingLast="true" omitNorms="true"/>
>>>omitNorms="true"/>
>>>omitNorms="true"/>
>>>positionIncrementGap="100">
>>
>>
>>>ignoreCase="true" words="stopwords.txt"/>
>>>generateWordParts="1" generateNumberParts="1"
>>catenateWords="1" catenateNumbers="1" catenateAll="0"
>>splitOnCaseChange="1"/>
>>
>>>protected="protwords.txt"/>
>>
>>
>>
>>>positionIncrementGap="100">
>>
>>
>>
>>>generateWordParts="1" generateNumberParts="1"/>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>

Re: "index-time" over boosted

2012-01-24 Thread remi tassing
Any idea?

This is a snippet of my schema.xml now:









   


...
   
   

 

 
 id

  > Jan Høydahl, search solution architect
> > Cominvent AS - www.cominvent.com
> > Solr Training - www.solrtraining.com
> >
> > On 19. jan. 2012, at 13:01, remi tassing wrote:
> >
> >> Hello Jan,
> >>
> >> My schema wasn't changed from the release 3.5.0. The content can be seen
> >> below:
> >>
> >> 
> >>
> >> >>sortMissingLast="true" omitNorms="true"/>
> >> >>omitNorms="true"/>
> >> >>omitNorms="true"/>
> >> >>positionIncrementGap="100">
> >>
> >>
> >> >>ignoreCase="true" words="stopwords.txt"/>
> >> >>generateWordParts="1" generateNumberParts="1"
> >>catenateWords="1" catenateNumbers="1" catenateAll="0"
> >>splitOnCaseChange="1"/>
> >>
> >> >>protected="protwords.txt"/>
> >>
> >>
> >>
> >> >>positionIncrementGap="100">
> >>
> >>
> >>
> >> >>generateWordParts="1" generateNumberParts="1"/>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> indexed="false"/>
> >> indexed="false"/>
> >>
> >>
> >>
> >>
> >>
> >>


Re: "index-time" over boosted

2012-01-24 Thread remi tassing
Hello,

thanks for helping out Jan, I really appreciate that!

These are full explains of two results:

Result#1.--

3.0412199E-5 = (MATCH) max of:
  3.0412199E-5 = (MATCH) weight(content:"mobil broadband"^0.5 in
19081), product of:
0.13921623 = queryWeight(content:"mobil broadband"^0.5), product of:
  0.5 = boost
  6.3531075 = idf(content: mobil=5270 broadband=2392)
  0.043826185 = queryNorm
2.1845297E-4 = fieldWeight(content:"mobil broadband" in 19081), product of:
  3.6055512 = tf(phraseFreq=13.0)
  6.3531075 = idf(content: mobil=5270 broadband=2392)
  9.536743E-6 = fieldNorm(field=content, doc=19081)

Result#2.-

2.6991445E-5 = (MATCH) max of:
  2.6991445E-5 = (MATCH) weight(content:"mobil broadband"^0.5 in
15306), product of:
0.13921623 = queryWeight(content:"mobil broadband"^0.5), product of:
  0.5 = boost
  6.3531075 = idf(content: mobil=5270 broadband=2392)
  0.043826185 = queryNorm
1.9388145E-4 = fieldWeight(content:"mobil broadband" in 15306), product of:
  1.0 = tf(phraseFreq=1.0)
  6.3531075 = idf(content: mobil=5270 broadband=2392)
  3.0517578E-5 = fieldNorm(field=content, doc=15306)

Remi


On Tue, Jan 24, 2012 at 3:38 PM, Jan Høydahl  wrote:

> That looks right. Can you restart your Solr, do a new search with
> &debugQuery=true and copy/paste the full EXPLAIN output for your query?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 24. jan. 2012, at 13:22, remi tassing wrote:
>
> > Any idea?
> >
> > This is a snippet of my schema.xml now:
> >
> > 
> > 
> >
> >
> > >required="true"/>
> > > omitNorms="true"/>
> >
> >
> >   
> > >multiValued="true"/>
> >
> > ...
> >   
> >   
> >
> > 
> >
> > 
> > id
> >
> >  >>> Jan Høydahl, search solution architect
> >>> Cominvent AS - www.cominvent.com
> >>> Solr Training - www.solrtraining.com
> >>>
> >>> On 19. jan. 2012, at 13:01, remi tassing wrote:
> >>>
> >>>> Hello Jan,
> >>>>
> >>>> My schema wasn't changed from the release 3.5.0. The content can be
> seen
> >>>> below:
> >>>>
> >>>> 
> >>>>   
> >>>>>>>>   sortMissingLast="true" omitNorms="true"/>
> >>>>>>>>   omitNorms="true"/>
> >>>>>>>>   omitNorms="true"/>
> >>>>>>>>   positionIncrementGap="100">
> >>>>   
> >>>>   
> >>>>>>>>   ignoreCase="true" words="stopwords.txt"/>
> >>>>>>>>   generateWordParts="1" generateNumberParts="1"
> >>>>   catenateWords="1" catenateNumbers="1"
> catenateAll="0"
> >>>>   splitOnCaseChange="1"/>
> >>>>   
> >>>>>>>>   protected="protwords.txt"/>
> >>>>class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >>>>   
> >>>>   
> >>>>>>>>   positionIncrementGap="100">
> >>>>   
> >>>>   
> >>>>   
> >>>>>>>>   generateWordParts="1" generateNumberParts="1"/>
> >>>>class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >>>>   
> >>>>   
> >>>>   
> >>>>   
> >>>>   
> >>>>
> >>>>   
> >>>>>> indexed="false"/>
> >>>>>> indexed="false"/>
> >>>>   
> >>>>
> >>>>   
> >>>>   
> >>>>   
> >>>>>>
>
>


Re: "index-time" over boosted

2012-01-25 Thread remi tassing
Hi,

it worked (I'm using Solr-3.4.0, not that it matters)!!

I'll try to figure out what went wrong ...with my limited skills.

The solution omitNorms="true" works for now but it's not a long term
solution in my opinion. I also need to figure out how to make all that work.

Thanks again Jan!!

Remi

On Tue, Jan 24, 2012 at 5:58 PM, Jan Høydahl  wrote:

> Hi,
>
> Well, I think you do it right, but get tricked by either editing the wrong
> file, a typo or browser caching.
> Why not try to start with a fresh Solr3.5.0, start the example app, index
> all exampledocs, search for "Podcasts", you get one hit, in fields "text"
> and "features".
> Then change solr/example/solr/conf/schema.xml and add omitNorms="true" to
> these two fields. Then stop Solr, delete your index, start Solr, re-index
> the docs and try again. fieldNorm is now 1.0. Once you get that working you
> can start debugging where you got it wrong in your own setup.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
>  On 24. jan. 2012, at 14:55, remi tassing wrote:
>
> > Hello,
> >
> > thanks for helping out Jan, I really appreciate that!
> >
> > These are full explains of two results:
> >
> > Result#1.--
> >
> > 3.0412199E-5 = (MATCH) max of:
> >  3.0412199E-5 = (MATCH) weight(content:"mobil broadband"^0.5 in
> > 19081), product of:
> >0.13921623 = queryWeight(content:"mobil broadband"^0.5), product of:
> >  0.5 = boost
> >  6.3531075 = idf(content: mobil=5270 broadband=2392)
> >  0.043826185 = queryNorm
> >2.1845297E-4 = fieldWeight(content:"mobil broadband" in 19081),
> product of:
> >  3.6055512 = tf(phraseFreq=13.0)
> >  6.3531075 = idf(content: mobil=5270 broadband=2392)
> >  9.536743E-6 = fieldNorm(field=content, doc=19081)
> >
> > Result#2.-
> >
> > 2.6991445E-5 = (MATCH) max of:
> >  2.6991445E-5 = (MATCH) weight(content:"mobil broadband"^0.5 in
> > 15306), product of:
> >0.13921623 = queryWeight(content:"mobil broadband"^0.5), product of:
> >  0.5 = boost
> >  6.3531075 = idf(content: mobil=5270 broadband=2392)
> >  0.043826185 = queryNorm
> >1.9388145E-4 = fieldWeight(content:"mobil broadband" in 15306),
> product of:
> >  1.0 = tf(phraseFreq=1.0)
> >  6.3531075 = idf(content: mobil=5270 broadband=2392)
> >  3.0517578E-5 = fieldNorm(field=content, doc=15306)
> >
> > Remi
> >
> >
> > On Tue, Jan 24, 2012 at 3:38 PM, Jan Høydahl 
> wrote:
> >
> >> That looks right. Can you restart your Solr, do a new search with
> >> &debugQuery=true and copy/paste the full EXPLAIN output for your query?
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com
> >> Solr Training - www.solrtraining.com
> >>
> >> On 24. jan. 2012, at 13:22, remi tassing wrote:
> >>
> >>> Any idea?
> >>>
> >>> This is a snippet of my schema.xml now:
> >>>
> >>> 
> >>> 
> >>>   
> >>>   
> >>>>>>   required="true"/>
> >>>>>> omitNorms="true"/>
> >>>   
> >>>   
> >>>  
> >>>>>>   multiValued="true"/>
> >>>
> >>> ...
> >>>  
> >>>  
> >>>
> >>> 
> >>>
> >>> 
> >>> id
> >>>
> >>>  >>>>> Jan Høydahl, search solution architect
> >>>>> Cominvent AS - www.cominvent.com
> >>>>> Solr Training - www.solrtraining.com
> >>>>>
> >>>>> On 19. jan. 2012, at 13:01, remi tassing wrote:
> >>>>>
> >>>>>> Hello Jan,
> >>>>>>
> >>>>>> My schema wasn't changed from the release 3.5.0. The content can be
> >> seen
> >>>>>> below:
> >>>>>>
> >>>>>> 
> >>>>>>  
> >>>>>>   >>>>>>  sortMissingLast="true" omitNorms="true"/>
> >>>>>>   >>>>>>  omitNorms="true"/>
> >>>>>>   >>>>>>  omitNorms="true"/>
> >>>>>>   >>>>>>  positionIncrementGap="100">
> >>>>>>  
> >>>>>>  
> >>>>>>   >>>>>>  ignoreCase="true" words="stopwords.txt"/>
> >>>>>>   >>>>>>  generateWordParts="1" generateNumberParts="1"
> >>>>>>  catenateWords="1" catenateNumbers="1"
> >> catenateAll="0"
> >>>>>>  splitOnCaseChange="1"/>
> >>>>>>  
> >>>>>>   >>>>>>  protected="protwords.txt"/>
> >>>>>>   >> class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >>>>>>  
> >>>>>>  
> >>>>>>   >>>>>>  positionIncrementGap="100">
> >>>>>>  
> >>>>>>  
> >>>>>>  
> >>>>>>   >>>>>>  generateWordParts="1" generateNumberParts="1"/>
> >>>>>>   >> class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >>>>>>  
> >>>>>>  
> >>>>>>  
> >>>>>>  
> >>>>>>  
> >>>>>>
> >>>>>>  
> >>>>>>   >>>> indexed="false"/>
> >>>>>>   >>>> indexed="false"/>
> >>>>>>   indexed="false"/>
> >>>>>>
> >>>>>>  
> >>>>>>  
> >>>>>>   indexed="true"/>
> >>>>>>   >>>>
> >>
> >>
>
>


Solr on remote server

2012-01-28 Thread remi tassing
Hi,

The example works well on the local machine, but how to make that work on a
remote server? Do you have to install jetty or tomcat ...?

Remi


Re: Writing a french Solr book - Ecrire un livre en français

2012-01-29 Thread remi tassing
I haven't seen any.

Have you thought of translating one?

Remi

On Sunday, January 29, 2012, SR  wrote:
> My main question is whether there's already a French book or not.
>
>
> On Jan 29, 2012, at 10:01 AM, Abhishek Tyagi wrote:
>
>> If you are thinking then do it, why do want people to tell you what you
>> should do.
>>
>> bestaluck!
>>
>> On Sun, Jan 29, 2012 at 8:20 PM, SR  wrote:
>>
>>> Hi there,
>>>
>>> Have you heard of any existing Solr book in French? If no, I'm thinking
of
>>> writing one. Do you think this could be useful for francophone
community?
>>>
>>> Thanks
>>> -SR
>>
>>
>>
>>
>> --
>> Abhishek Tyagi
>> Let's just say.. I'm the Frankenstein's Monster.
>
>


search returns 'categories' instead of url

2012-01-29 Thread remi tassing
Hi,

Let's say Solr is setup and can return relevant urls. What if I wanted to
get the most cited terms from a predefined list, instead? It could be from
a list of products, names, cities...

Any ideas?

Remi


Re: search returns 'categories' instead of url

2012-01-31 Thread remi tassing
After looking at the Carrot2 introduction, it seems this can be solved with
clustering but with pre-defined categories.

Does that make sense?

Remi

On Sun, Jan 29, 2012 at 8:42 PM, remi tassing  wrote:

> Hi,
>
> Let's say Solr is setup and can return relevant urls. What if I wanted to
> get the most cited terms from a predefined list, instead? It could be from
> a list of products, names, cities...
>
> Any ideas?
>
> Remi


Re: search returns 'categories' instead of url

2012-02-01 Thread remi tassing
This topic is either boring or not clear enough...

Another alternative solution would be to add a category field to the
already crawled content.

Any idea how to do it?

Remi

On Tuesday, January 31, 2012, remi tassing  wrote:
> After looking at the Carrot2 introduction, it seems this can be solved
with clustering but with pre-defined categories.
> Does that make sense?
> Remi
>
> On Sun, Jan 29, 2012 at 8:42 PM, remi tassing 
wrote:
>>
>> Hi,
>>
>> Let's say Solr is setup and can return relevant urls. What if I wanted
to get the most cited terms from a predefined list, instead? It could be
from a list of products, names, cities...
>>
>> Any ideas?
>>
>> Remi
>


Re: search returns 'categories' instead of url

2012-02-02 Thread remi tassing
Sincere apologies for the unclarity! I'm probably misusing technical terms
such 'category' ...

Ok let's assume we have the basic solr engine that's able to search and
give result of urls...now from those pages, I would like to know which
terms are the most mentioned, e.g. iPad, Samsung, Candy...the list can be
long but we could decide to only output the top#20 or so.

I'm not sure if this a more 'facet' or 'category' or 'cluster' job in Solr
terminology.

Remi

On Thursday, February 2, 2012, Chris Hostetter 
wrote:
>
> : > Another alternative solution would be to add a category field to the
> : > already crawled content.
>
> : > >> Let's say Solr is setup and can return relevant urls. What if I
wanted
> : > to get the most cited terms from a predefined list, instead? It could
be
> : > from a list of products, names, cities...
>
> You relaly need to explain your problem more -- i'm having a hard time
> understanding what type of usecase/situation you might be describing.
>
> based on your initial description it seems like you are just asking about
> something like using facet.query to get counts for specific terms; but
> them in your followup the idea of adding categorization to your existing
> index almost smells like a machine learning type problem.
>
> the question is just really too vague to make any guesses at.
>
> please give a specific examples of the type of data you are working with,
> the types of requests you want to send, the types of results you want to
> give back from those requests, and the types of results you do *NOT*
> wantto get back. so we can understand the boundaries of your probem.
>
>
> -Hoss
>


Re: Solritas: Modify $content in layout.vm

2012-02-18 Thread remi tassing
Yes, I'm using the example configuration (Solr-3.4).

What I'm trying to do is to remove the menus on the left side ("Query
Facets", "Range Facets", "Clusters"), and the "boost by price" button. I'm
not using them for now and they're kind of distracting.

Thanks, again, in advance!

Remi

On Fri, Feb 17, 2012 at 11:56 PM, Erik Hatcher wrote:

> $content is output of the main template rendered.
>
> To modify what is generated into $content, modify the main template or the
> sub-#parsed templates (which is what you've discovered, looks like) that is
> rendered (browse.vm, perhaps, if you're using the default example setup).
>  The main template that is rendered is specified as v.template (in the
> /browse handler definition in solrconfig.xml, again if you're using the
> example configuration).
>
> Does that help?  If not, let us know what you're trying to do exactly.
>
>Erik
>
>
>
>
> On Feb 16, 2012, at 23:06 , remi tassing wrote:
>
> > Hi all,
> >
> > How do we modify the "$content" variable in the layout.vm file? I could
> > managed to change other stuff in doc.vm or header.vm but not this one.
> >
> > Is there any tutorial on this?
> >
> > Remi
>
>


Re: Solritas: Modify $content in layout.vm

2012-02-19 Thread remi tassing
Yeah, that works for now. I'll check that §content thing later on.

Thanks man!

Remi

On Sunday, February 19, 2012, Erik Hatcher  wrote:
> Unfortunately things a bit messy in there because others have tried to
make a kitchen sink of things in there, but as I said, it all starts with
browse.vm and then follow any #parse's from there.  You'll see browse.vm
#parse's "facets.vm", and in there you'll see how it then #parse's to those
various pieces you mention, so you can just remove, say, the
#parse('cluster.vm') in there (and subsequently clean it up and actually
remove the now unused cluster.vm file if you like).  And so on.
>
>Erik
>
>
> On Feb 18, 2012, at 10:23 , remi tassing wrote:
>
>> Yes, I'm using the example configuration (Solr-3.4).
>>
>> What I'm trying to do is to remove the menus on the left side ("Query
>> Facets", "Range Facets", "Clusters"), and the "boost by price" button.
I'm
>> not using them for now and they're kind of distracting.
>>
>> Thanks, again, in advance!
>>
>> Remi
>>
>> On Fri, Feb 17, 2012 at 11:56 PM, Erik Hatcher wrote:
>>
>>> $content is output of the main template rendered.
>>>
>>> To modify what is generated into $content, modify the main template or
the
>>> sub-#parsed templates (which is what you've discovered, looks like)
that is
>>> rendered (browse.vm, perhaps, if you're using the default example
setup).
>>> The main template that is rendered is specified as v.template (in the
>>> /browse handler definition in solrconfig.xml, again if you're using the
>>> example configuration).
>>>
>>> Does that help?  If not, let us know what you're trying to do exactly.
>>>
>>>   Erik
>>>
>>>
>>>
>>>
>>> On Feb 16, 2012, at 23:06 , remi tassing wrote:
>>>
>>>> Hi all,
>>>>
>>>> How do we modify the "$content" variable in the layout.vm file? I could
>>>> managed to change other stuff in doc.vm or header.vm but not this one.
>>>>
>>>> Is there any tutorial on this?
>>>>
>>>> Remi
>>>
>>>
>
>


Re: need to support bi-directional synonyms

2012-02-22 Thread remi tassing
Same question here...

On Wednesday, February 22, 2012, geeky2  wrote:
> hello all,
>
> i need to support the following:
>
> if the user enters "sprayer" in the desc field - then they get results for
> BOTH "sprayer" and "washer".
>
> and in the other direction
>
> if the user enters "washer" in the desc field - then they get results for
> BOTH "washer" and "sprayer".
>
> would i set up my synonym file like this?
>
> assuming expand = true..
>
> sprayer => washer
> washer => sprayer
>
> thank you,
> mark
>
> --
> View this message in context:
http://lucene.472066.n3.nabble.com/need-to-support-bi-directional-synonyms-tp3767990p3767990.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>