date:20100528


Hello !

I am trying to apply the solr-236 patch to the sources I got from svn. I
downloaded the sources from path-to-repository/tags/release-1.4.0

I tried to apply this patch
https://issues.apache.org/jira/secure/attachment/12444611/SOLR-236-trunk.patch
and this one :
https://issues.apache.org/jira/secure/attachment/12434435/SOLR-236.patch
with eclipse

but when I compile I got compilation errors. I guess that I took the wrong
patch (or the wrong projet). I also tried to apply the patches on the
sources I got from solr/branches/branch1.4 but it didn't work.

Regards

Sophie
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Applying-collapse-patch-tp851121p851121.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Tagging Facet Queries -- Urgent Help Required

2010-05-28 Thread Erik Hatcher

You've tagged facet queries, but looks like you might want to use the  
"excl"ude capability on your filter queries also.  Filter queries are  
additive, constraining the results further for each one, and by  
default faceting is based off the search results.  Use excl to have  
facets count outside the actual constrained search results.


Erik

On May 28, 2010, at 4:17 AM, Ninad Raut wrote:


Hi All,
I have a use case where I have to tag facet queries.

Here is the code snippet for what I tried:
query.addFilterQuery("{!tag=NE}med:Blog AND slev:neutral");
query.addFacetQuery("{!tag=NE key=BLOG}med:Blog AND slev:neutral");
query.addFilterQuery("{!tag=P}med:Review AND slev:neutral");
query.addFacetQuery("{!tag=P key=Review}med:Review AND slev:neutral");

The result was {BLOG=0, Review=0}

but when I run separate queries :

query1.addFilterQuery("{!tag=NE}med:Blog AND slev:neutral");
query1.addFacetQuery("{!tag=NE key=BLOG}med:Blog AND slev:neutral");
and
query2.addFilterQuery("{!tag=P}med:Review AND slev:neutral");
query2.addFacetQuery("{!tag=P key=Forum}med:Review AND slev:neutral");

I get correct results.
{BLOG=98} and {Forum=830} respectively.

I want to do this in a single query (with multiple facets). Is there  
some

other way of tagging facet queries?

Can any one help me with this?

Regards,
Ninad R

Solr spellchecker field

2010-05-28 Thread Dejan Noveski

Hi,

Does the field that is used for spellchecker indexing need to be stored
and/or indexed? These fields became fairly large in my index, and php wont
parse/decode the documents returned.

-- 
--
Dejan Noveski
Web Developer
dr.m...@gmail.com
Twitter: http://twitter.com/dekomote | LinkedIn:
http://mk.linkedin.com/in/dejannoveski

Re: Solr spellchecker field

2010-05-28 Thread Erik Hatcher

A field used to build a spellcheck index only needs to be indexed, not  
stored.


But, your PHP issue could be alleviated anyway by simply customizing  
the fl parameter and excluding the large stored field.  This is often  
desirable for large fields that are never needed fully in the UI, but  
used internally for highlighting.


Erik

On May 28, 2010, at 4:47 AM, Dejan Noveski wrote:


Hi,

Does the field that is used for spellchecker indexing need to be  
stored
and/or indexed? These fields became fairly large in my index, and  
php wont

parse/decode the documents returned.

--
--
Dejan Noveski
Web Developer
dr.m...@gmail.com
Twitter: http://twitter.com/dekomote | LinkedIn:
http://mk.linkedin.com/in/dejannoveski

Re: Solr spellchecker field

2010-05-28 Thread Dejan Noveski

Thank you very much!

On Fri, May 28, 2010 at 10:57 AM, Erik Hatcher wrote:

> A field used to build a spellcheck index only needs to be indexed, not
> stored.
>
> But, your PHP issue could be alleviated anyway by simply customizing the fl
> parameter and excluding the large stored field.  This is often desirable for
> large fields that are never needed fully in the UI, but used internally for
> highlighting.
>
>Erik
>
>
> On May 28, 2010, at 4:47 AM, Dejan Noveski wrote:
>
>  Hi,
>>
>> Does the field that is used for spellchecker indexing need to be stored
>> and/or indexed? These fields became fairly large in my index, and php wont
>> parse/decode the documents returned.
>>
>> --
>> --
>> Dejan Noveski
>> Web Developer
>> dr.m...@gmail.com
>> Twitter: http://twitter.com/dekomote | LinkedIn:
>> http://mk.linkedin.com/in/dejannoveski
>>
>
>


-- 
--
Dejan Noveski
Web Developer
dr.m...@gmail.com
Twitter: http://twitter.com/dekomote | LinkedIn:
http://mk.linkedin.com/in/dejannoveski

Re: Applying collapse patch

2010-05-28 Thread Peter Karich

I had success with a previous version (~ 12/2009). Try to ask directly
in the comments of the patch.
I got immediately help there.

Regards,
Peter.

> Hello !
>
> I am trying to apply the solr-236 patch to the sources I got from svn. I
> downloaded the sources from path-to-repository/tags/release-1.4.0
>
> I tried to apply this patch
> https://issues.apache.org/jira/secure/attachment/12444611/SOLR-236-trunk.patch
> and this one :
> https://issues.apache.org/jira/secure/attachment/12434435/SOLR-236.patch
> with eclipse
>
> but when I compile I got compilation errors. I guess that I took the wrong
> patch (or the wrong projet). I also tried to apply the patches on the
> sources I got from solr/branches/branch1.4 but it didn't work.
>
> Regards
>
> Sophie
>

Re: Sites with Innovative Presentation of Tags and Facets

2010-05-28 Thread Erik Hatcher

Here's a slider example that narrows down how many tags/facets are
displayed:

How about a tree map? See my slides from the prototyping preso at
EuroCon last week:

Pie in the sky, how about pie charts? I like 'em :) From Koji's blog
(but his demo site is currently down):

Perhaps only tangentially related, but one thing I really prefer in a
rich search UI is the ability to invert constraints. Show me
everything about "flare", now narrow to the science category. Now
invert it, for everything _not_ in the science category. This plays
in with how facets are used for drilling in (or broadening!) the
search experience. Ahhh, serendipity!

Erik

On May 27, 2010, at 4:50 PM, Mark Bennett wrote:
I'm a big fan of plain old text facets (or tags), displayed in some
logical
order, perhaps with a bit of indenting to help convey context. But
as you

may have noticed, I don't rule the world. :-)

Suppose you took the opposite approach, rending facets in non-
traditional

ways, that were still functional, and not ugly.

Are there any pubic sites that come to mind that are displaying
facets,
tags, clusters, taxonomies or other navigators in really innovative
ways?

And what you liked / didn't like?

Right now I'm just looking for examples of what's been tried. I
suppose

even bad examples might be educational.

My future ideal wish list:
* Stays out of the way (of casual users)
* Looks "clean" and "cool" (to the power users)
I'm thinking for example a light gray chevron ">>" that casual
users

don't notice,
but when you click on it, cool things come up?
* Probably that does not require Flash or SilverLight (just to avoid
the

whole platform wars)
I guess that means Ajax or HTML5
* And since I'm doing pie in the sky, can be made to look good on
desktops

and mobile

Some examples to get the ball rolling:

StackOverflow, Flickr and YouTube, Clusty(now Yippy) are all nice,
but a bit

pedestrian for my mission today.
(grokker was cool too)

Lucid has done a nice job with Facets and Solr:
http://www.lucidimagination.com/search/
And although I really like it, it's not a flashy enough specimen for
what

I'm hunting today.
(and they should thread the actual results list)

I did some mockups of "2.0 style" search navigators a couple years
back:

http://www.ideaeng.com/tabId/98/itemId/115/Search-20-in-the-Enterprise-Moving-Beyond-Singl.aspx
Though these were intentionally NOT derived from specific web sites.

Digg has done some cool stuff, for example:
http://labs.digg.com/365/
http://labs.digg.com/arc/
http://labs.digg.com/stack/
But for what I'm after, these are a bit too far off of the
"searching for

something in particular" track.

Google Image Swirl and Similar Images are interesting, but for images.
Lots of other cool stuff at labs.google.com

Amazon, NewEgg, etc are all fine, but again text based.

TouchGraph has some cool stuff, though very non-linear (many others
on this

theme)
http://www.touchgraph.com/TGGoogleBrowser.html
http://www.touchgraph.com/navigator.html

Cool articles on the subject: (some examples now offline)
http://www.cs.umd.edu/class/spring2005/cmsc838s/viz4all/viz4all_a.html

--
Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513

Re: Applying collapse patch


Ok I will have a look on the comments and I will post if necessary.

Thanks ^^
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Applying-collapse-patch-tp851121p851170.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Tagging Facet Queries -- Urgent Help Required

2010-05-28 Thread Ninad Raut

Thanks Erick,


On Fri, May 28, 2010 at 2:17 PM, Erik Hatcher wrote:

> You've tagged facet queries, but looks like you might want to use the
> "excl"ude capability on your filter queries also.  Filter queries are
> additive, constraining the results further for each one, and by default
> faceting is based off the search results.  Use excl to have facets count
> outside the actual constrained search results.
>
>Erik
>
>
> On May 28, 2010, at 4:17 AM, Ninad Raut wrote:
>
>  Hi All,
>> I have a use case where I have to tag facet queries.
>>
>> Here is the code snippet for what I tried:
>> query.addFilterQuery("{!tag=NE}med:Blog AND slev:neutral");
>> query.addFacetQuery("{!tag=NE key=BLOG}med:Blog AND slev:neutral");
>> query.addFilterQuery("{!tag=P}med:Review AND slev:neutral");
>> query.addFacetQuery("{!tag=P key=Review}med:Review AND slev:neutral");
>>
>> The result was {BLOG=0, Review=0}
>>
>> but when I run separate queries :
>>
>> query1.addFilterQuery("{!tag=NE}med:Blog AND slev:neutral");
>> query1.addFacetQuery("{!tag=NE key=BLOG}med:Blog AND slev:neutral");
>> and
>> query2.addFilterQuery("{!tag=P}med:Review AND slev:neutral");
>> query2.addFacetQuery("{!tag=P key=Forum}med:Review AND slev:neutral");
>>
>> I get correct results.
>> {BLOG=98} and {Forum=830} respectively.
>>
>> I want to do this in a single query (with multiple facets). Is there some
>> other way of tagging facet queries?
>>
>> Can any one help me with this?
>>
>> Regards,
>> Ninad R
>>
>
>

SolrException: No such core

2010-05-28 Thread jfmnews

With embedded solr (1.3.0) sometime a SolrException happens. 
I don't understand why : I have not been able to find a scenario. 


org.apache.solr.common.SolrException: No such core: core0
at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:112)
at 
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:217)
at org.apache.solr.client.solrj.SolrServer.deleteById(SolrServer.java:97)

Regards

JF

Re: Applying collapse patch

2010-05-28 Thread Martijn v Groningen

The trunk should work with the latest patch (SOLR-236-trunk.patch).
Did patching go successful? What compilation errors you get?

On 28 May 2010 11:10, Sophie M.  wrote:
>
> Ok I will have a look on the comments and I will post if necessary.
>
> Thanks ^^
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Applying-collapse-patch-tp851121p851170.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Applying collapse patch


Hi,

I am getting the trunk, I try to apply the patch, to compile and I tell you
what I get.

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Applying-collapse-patch-tp851121p851222.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Applying collapse patch


It is ok for applying the patch, thanks Martin. When I start Solr I get this
logs in my console :

C:\Users\Sophie\workspace\lucene-solr\lucene-solr\solr\solr>java -jar
start.jar
2010-05-28 12:09:30.037:INFO::Logging to STDERR via
org.mortbay.log.StdErrLog
2010-05-28 12:09:30.178:INFO::jetty-6.1.22
2010-05-28 12:09:30.208:INFO::Opened
C:\Users\Sophie\workspace\lucene-solr\lucene-solr\solr\solr\logs\2010_05_28.request.log
2010-05-28 12:09:30.218:INFO::Started socketconnec...@0.0.0.0:8983

and I have a 404 error on http://localhost:8983/solr

I am investigating ^^

regards

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Applying-collapse-patch-tp851121p851308.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Applying collapse patch

2010-05-28 Thread Martijn v Groningen

Have you executed: "ant example" after building? (Assuming that this
is the example solr)

On 28 May 2010 12:17, Sophie M.  wrote:
>
> It is ok for applying the patch, thanks Martin. When I start Solr I get this
> logs in my console :
>
> C:\Users\Sophie\workspace\lucene-solr\lucene-solr\solr\solr>java -jar
> start.jar
> 2010-05-28 12:09:30.037:INFO::Logging to STDERR via
> org.mortbay.log.StdErrLog
> 2010-05-28 12:09:30.178:INFO::jetty-6.1.22
> 2010-05-28 12:09:30.208:INFO::Opened
> C:\Users\Sophie\workspace\lucene-solr\lucene-solr\solr\solr\logs\2010_05_28.request.log
> 2010-05-28 12:09:30.218:INFO::Started socketconnec...@0.0.0.0:8983
>
> and I have a 404 error on http://localhost:8983/solr
>
> I am investigating ^^
>
> regards
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Applying-collapse-patch-tp851121p851308.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Met vriendelijke groet,

Martijn van Groningen

AW: XSLT for JSON

2010-05-28 Thread Markus.Rietzler

ok,but is there an easy way to influence the format of json output?
eg field order, names etc. maybe i want to group the result differently or add 
some infos

> -Ursprüngliche Nachricht-
> Von: Jon Baer [mailto:jonb...@gmail.com] 
> Gesendet: Mittwoch, 26. Mai 2010 19:39
> An: solr-user@lucene.apache.org
> Betreff: Re: XSLT for JSON
> 
> You should already get this out of the box ... just tack on a 
> wt=json to the params ie ...
> 
> http://localhost:8983/solr/select/?q=*%3A*&version=2.2&start=0
> &rows=10&indent=on&qt=tvrh&tv=true&tv.tf=true&tv.df=true&tv.po
> sitions&tv.offsets=true&wt=json
> 
> If you look @ /apache-solr-1.4.0/contrib/velocity/src/main 
> you will see another writer which lets you use templates for 
> formatting anyway you wish.  
> 
> Then you would end up with wt=velocity&velocity.template=mytemplate
> 
> - Jon
> 
> On May 26, 2010, at 1:03 PM, stockii wrote:
> 
> > 
> > Hello.
> > 
> > I have a little/big problem. 
> > 
> > i want to change the response format from the 
> TermsComponent. It is possible
> > to change with XSLT from XML to my JSON format ? or with 
> xslt from json to
> > json ... ;-)
> > 
> > the new JSON format should exactly the same like the 
> standard response ... 
> > 
> > thx
> > -- 
> > View this message in context: 
> http://lucene.472066.n3.nabble.com/XSLT-for-JSON-tp845386p845386.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> 
>

Re: Applying collapse patch


I don't know how, but it works now. Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Applying-collapse-patch-tp851121p851502.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: highlighting broken for multivalued text fields?

2010-05-28 Thread Darren Govoni

Ah, ok. Well, the fieldType was "text", untouched from the Solr default.

hth,
Darren

On Thu, 2010-05-27 at 22:06 -0700, Chris Hostetter wrote:

> : Hi Koji,
> :Well, its quite simple. Here is the field returned from my query:
> : "fox"
> 
> Actually what Koji was asking for was the  declaration for 
> "text" (you posted the  but not the  so we only have 
> half a picture of hte settings involved)
> 
> That said: the subject of this thread caught my eye, because it sounds 
> very familiar to a known bug in 1.4 that has been fixed in svn (which i 
> just happend to be looking at because i was cleaning up Jira) ...
> 
> https://issues.apache.org/jira/browse/SOLR-1624
> 
> -Hoss
>

Re: Solr spellchecker field

2010-05-28 Thread Israel Ekpo

Dejan,

How are you making the calls from PHP to Solr?

I am curious to know why the documents could not be parsed

On Fri, May 28, 2010 at 5:00 AM, Dejan Noveski  wrote:

> Thank you very much!
>
> On Fri, May 28, 2010 at 10:57 AM, Erik Hatcher  >wrote:
>
> > A field used to build a spellcheck index only needs to be indexed, not
> > stored.
> >
> > But, your PHP issue could be alleviated anyway by simply customizing the
> fl
> > parameter and excluding the large stored field.  This is often desirable
> for
> > large fields that are never needed fully in the UI, but used internally
> for
> > highlighting.
> >
> >Erik
> >
> >
> > On May 28, 2010, at 4:47 AM, Dejan Noveski wrote:
> >
> >  Hi,
> >>
> >> Does the field that is used for spellchecker indexing need to be stored
> >> and/or indexed? These fields became fairly large in my index, and php
> wont
> >> parse/decode the documents returned.
> >>
> >> --
> >> --
> >> Dejan Noveski
> >> Web Developer
> >> dr.m...@gmail.com
> >> Twitter: http://twitter.com/dekomote | LinkedIn:
> >> http://mk.linkedin.com/in/dejannoveski
> >>
> >
> >
>
>
> --
> --
> Dejan Noveski
> Web Developer
> dr.m...@gmail.com
> Twitter: http://twitter.com/dekomote | LinkedIn:
> http://mk.linkedin.com/in/dejannoveski
>



-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/

Custom sorting

2010-05-28 Thread Fornoville, Tom

Hello everyone,

 

I'm new to Solr but have been asked to do an evaluation as an
alternative for a commercial search engine.

I have some experience with Lucene and a java background so I'm not
afraid to dive into code :-)

 

The application now has a very particular way of sorting results using
something called "buckets".

 

I'll try to explain with a bit of details:

In the interface they have 2 fields: "what" and "where".

Both fields are actually sets of fields (what = category, name, contact
info... and where= country, state, region, city...) so the copyfield
feature of Solr immediately comes to mind.

Now based on the field generated the actual match the result should end
up in a specific bucket.

In particular the first bucket contains all the result documents that
have an exact match on the category field, in the second bucket all
exact matches on name, the third partial matches on category, the fourth
partial matches on name, the fifth matches on contact info etc...

Then within each of those first tier buckets all results are placed in
second tier buckets depending on what location was matched: city, then
region, then province and so on.

To even complicate things more there is also a third tier bucket where
results are placed according to the value of a ranking field: all
documents with the value 1 in the ranking field go in bucket 1 and so
on.

And finally results should be randomized in the third tier bucket...

On top of this they obviously want support for facets and paging.

 

My apologies for the long mail but I would greatly appreciate feedback
and/or suggestions.

I'm aware that this that this is a very particular problem but
everything that points me in the right direction is helpful.

 

Cheers,

Tom

Sort by function workaround for Solr 1.4

2010-05-28 Thread Martynas Miliauskas

Hi,

I need to sort query results by the output of some function which takes
"score" and couple other fields as an input (50% of the total score comes
from similarity score and 50% comes from document's popularity). Is there a
workaround which does not involve installation of any patches.

I have found a user comment at this page
https://issues.apache.org/jira/browse/SOLR-1297 (Enable sorting by function
query) where he has mentioned that there is a workaround "(main query)^0
func(...)" that can be used to sort results by function without having to
install "SOLR-1297" patch. Could anyone please explain this workaround a bit
in more detail? Maybe someone could give me an example of using it?

Thank you!

strange results with query and hyphened words

2010-05-28 Thread Markus.Rietzler

i am wondering why a search term with hyphen doesn't match.

my search term is "prof-auskunft". in WordDelimiterFilterFactory i have
catenateWords, so my understanding is that profi-auskunft would search
for profiauskunft. when i use the analyse panel in solr admi i see that
profi-auskunft matches a term "profiauskunft".

the analyse will show

Query Analyzer
WhitespaceTokenizerFactory 
profi-auskunft
SynonymFilterFactory 
profi-auskunft
StopFilterFactory 
profi-auskunft

WordDelimiterFilterFactory 

term position   1   2
term text   profi   auskunft
profiauskunft
term type   wordword
word
source start,end0,5 6,14
0,15

LowerCaseFilterFactory 
SnowballPorterFilterFactory 

why is auskunft and profiauskunft in one column. how do they get
searched?

when i search "profiauskunft" i have 230 hits, when i now search for
"profi-auskunft" i do get less hits. when i call the search with
debugQuery=on i see 

body:"profi (auskunft profiauskunft)"

what does this query mean? profi and "auskunft or profiauskunft"?

Interleaving the results

2010-05-28 Thread NarasimhaRaju

Hi,
how to achieve custom ordering of the documents when there is a general query?

Usecase:
Interleave documents from different customers one after the other.

Example:
Say i have 10 documents in the index belonging to 3 customers (customer_id 
field in the index ) and using query *:*
so all the documents in the results score the same.
but i want the results to be interleaved 
one document from the each customer should appear before a document from the 
same customer repeats ?

is there a way to achieve this ?


Thanks in advance 

R.

Prefix-Search with Stopwords - no results?

2010-05-28 Thread Gert Brinkmann



Hello,

I am having some problems with solr 1.4. I am indexing and querying data 
using the following fieldType:




  







  
  








  



The application that is using solr does prepare the search string to 
filter out some dangerous characters like brackets and wildcards, etc, 
that otherwise might lead to a wrong query syntax.


All words are searched for as a normal word as well as a prefix. E.g.: 
"für solr" is converted by the application to

  (für OR für*) AND (solr OR solr*)

This works fine for normal words. But if you have a stopword like "für" 
in this example, the query will be stopword filtered by solr to 
something like this:

  (für*) AND (solr OR solr*)

The problem now is (as I think) that there is no "für*" anymore in the 
indexed data, because it was stopword filtered, too. If now someone 
copy&pastes a sentence from an indexed document that contains a 
stopword, this document will not be found by solr.


The enablePositionIncrements="true" only is (AFAIU) for querying 
phrases, but not for my case of "word OR word*" queries.


So, what should I do? Is there a better filter combination that I could 
try? Or am I doing something wrong conceptually? The only solution that 
I have found working is to not use stopword filtering at all.


Greetings,
Gert

Re: Sites with Innovative Presentation of Tags and Facets

2010-05-28 Thread Mark Bennett

Thanks Geert,

Trip Advisor was interesting, I also see another "sliders" site was sent
around.

But I don't think all their Facets are "binding".

For example, to test no-results, I set it to 4 start hotels in SF with a max
of $50 / night - obviously not reasonable.

But it showed some hotels. At first I thought maybe some cool deals, but
then noticed that plenty of them were way under four stars.  I could
rationalize this by saying that the slider values represent other query
parameters, to be weighted in relevancy calculations along with the search
terms, but generally not what folks expect.

--
Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513


On Thu, May 27, 2010 at 2:32 PM, Geert-Jan Brits  wrote:

> Something like sliders perhaps?
> Of course only numerical ranges can be put into sliders. (or a concept that
> may be logically presented as some sort of ordening, such as "bad, hmm,
> good, great"
>
> Use Solr's Statscomponent to show the min and max values
>
> Have a look at tripadvisor.com for good uses/implementation of sliders
> (price, and reviewscore are presented as sliders)
> my 2c: try to make the possible input values discrete (like at tripadvisor)
> which gives a better user experience and limits the potential nr of queries
> (cache-wise advantage)
>
> Cheers,
> Geert-Jan
>
> 2010/5/27 Mark Bennett 
>
> > I'm a big fan of plain old text facets (or tags), displayed in some
> logical
> > order, perhaps with a bit of indenting to help convey context. But as you
> > may have noticed, I don't rule the world.  :-)
> >
> > Suppose you took the opposite approach, rending facets in non-traditional
> > ways, that were still functional, and not ugly.
> >
> > Are there any pubic sites that come to mind that are displaying facets,
> > tags, clusters, taxonomies or other navigators in really innovative ways?
> >  And what you liked / didn't like?
> >
> > Right now I'm just looking for examples of what's been tried.  I suppose
> > even bad examples might be educational.
> >
> > My future ideal wish list:
> > * Stays out of the way (of casual users)
> > * Looks "clean" and "cool" (to the power users)
> >I'm thinking for example a light gray chevron ">>" that casual users
> > don't notice,
> >but when you click on it, cool things come up?
> > * Probably that does not require Flash or SilverLight (just to avoid the
> > whole platform wars)
> >I guess that means Ajax or HTML5
> > * And since I'm doing pie in the sky, can be made to look good on
> desktops
> > and mobile
> >
> > Some examples to get the ball rolling:
> >
> > StackOverflow, Flickr and YouTube, Clusty(now Yippy) are all nice, but a
> > bit
> > pedestrian for my mission today.
> > (grokker was cool too)
> >
> > Lucid has done a nice job with Facets and Solr:
> > http://www.lucidimagination.com/search/
> > And although I really like it, it's not a flashy enough specimen for what
> > I'm hunting today.
> > (and they should thread the actual results list)
> >
> > I did some mockups of "2.0 style" search navigators a couple years back:
> >
> >
> http://www.ideaeng.com/tabId/98/itemId/115/Search-20-in-the-Enterprise-Moving-Beyond-Singl.aspx
> > Though these were intentionally NOT derived from specific web sites.
> >
> > Digg has done some cool stuff, for example:
> > http://labs.digg.com/365/
> > http://labs.digg.com/arc/
> > http://labs.digg.com/stack/
> > But for what I'm after, these are a bit too far off of the "searching for
> > something in particular" track.
> >
> > Google Image Swirl and Similar Images are interesting, but for images.
> > Lots of other cool stuff at labs.google.com
> >
> > Amazon, NewEgg, etc are all fine, but again text based.
> >
> > TouchGraph has some cool stuff, though very non-linear (many others on
> this
> > theme)
> > http://www.touchgraph.com/TGGoogleBrowser.html
> > http://www.touchgraph.com/navigator.html
> >
> >
> > Cool articles on the subject: (some examples now offline)
> > http://www.cs.umd.edu/class/spring2005/cmsc838s/viz4all/viz4all_a.html
> >
> >
> >
> > --
> > Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com
> > Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
> >
>

Re: Sites with Innovative Presentation of Tags and Facets

2010-05-28 Thread Mark Bennett

Hi Lukas,

Displaying 2 numbers is an interesting variant.  Not for a casual consumer
site, but actually pretty cool for a site appealing to engineers.

On the formatting front though, the (nn/mm) is a bit visually "dense".
 Might I suggest some tweaks:
1: Drop the parenthesis, in favor of some other visual separation, but
cutting down on the number of characters
2: Change the "/" to "(space) of (space)"
3: Instead of making the numbers more bold than the text, perhaps go the
opposite way, making them non-bold, perhaps smaller or ittallics

So instead of:

Some value *(50/60)*

You'd have:

Some value *- 50 of 60*

Something like that I'm no artist.

--
Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513


On Thu, May 27, 2010 at 2:37 PM, Lukas Kahwe Smith wrote:

>
> On 27.05.2010, at 23:32, Geert-Jan Brits wrote:
>
> > Something like sliders perhaps?
> > Of course only numerical ranges can be put into sliders. (or a concept
> that
> > may be logically presented as some sort of ordening, such as "bad, hmm,
> > good, great"
> >
> > Use Solr's Statscomponent to show the min and max values
> >
> > Have a look at tripadvisor.com for good uses/implementation of sliders
> > (price, and reviewscore are presented as sliders)
> > my 2c: try to make the possible input values discrete (like at
> tripadvisor)
> > which gives a better user experience and limits the potential nr of
> queries
> > (cache-wise advantage)
>
>
> yeah i have been pondering something similar. but i now realized that this
> way the user doesnt get an overview of the distribution without actually
> applying the filter. that being said, it would be nice to display 3 numbers
> with the silders, the count of items that were filtered out on the lower and
> upper boundaries as well as the number of items still left (*).
>
> aside from this i just put a little tweak to my facetting online:
> http://search.un-informed.org/search?q=malaria&tm=any&s=Search
>
> if you deselect any of the checkboxes, it updates the counts. however i
> display both the count without and with those additional checkbox filters
> applied (actually i only display two numbers of they are not the same):
> http://screencast.com/t/MWUzYWZkY2Yt
>
> regards,
> Lukas Kahwe Smith
> m...@pooteeweet.org
>
> (*) if anyone has a slider that can do the above i would love to integrate
> that and replace the adoption year checkboxes with that

Changing schema without having to reindex

2010-05-28 Thread David


Hi,

Can anyone tell me if it is possible to change the schema without having 
to reindex? I want to change the stored fields specifically.  Any help 
would be appreciated, thanks.

Re: Sites with Innovative Presentation of Tags and Facets

2010-05-28 Thread Mark Bennett

Haha!  Important tooltips are now "deprecated" in Web Applications.

This is nothing "official", of course.

But it's being advised to avoid important UI tasks that require cursor
tracking, mouse-over, hovering, etc. in web applications.

Why?  Many touch-centric mobile devices don't support "hover".  For me I'm
used to my laptop where the touch pad or stylus *is* able to measure the
pressure.  But the finger based touch devices generally can differenciate it
I guess.

They *can* tell one gesture from another, but only looking at the timing and
shape.  And hapless hover aint one of them.

With that said, I'm still a fan of Tool Tips in desktop IDE's like Eclipse,
or even on Web applications when I'm on a desktop.

I guess the point is that, if it's a really important thing, then you need
to expose it in another way on mobile.

Just passing this on, please don't shoot the messenger.  ;-)

Mark

--
Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513


On Thu, May 27, 2010 at 2:55 PM, Geert-Jan Brits  wrote:

> Perhaps you could show the 'nr of items left' as a tooltip of sorts when
> the
> user actually drags the slider.
> If the user doesn't drag (or hovers over ) the slider 'nr of items left'
> isn't shown.
>
> Moreover, initially a slider doesn't limit the results so 'nr of items
> left'
> shown for the slider would be the same as the overall number of items left
> (thereby being redundant)
>
> I must say I haven't seen this been implemented but it would be rather easy
> to adapt a slider implementation, to show the nr on drag/ hover.  (they
> exit
> for jquery, scriptaculous and a bunch of other libs)
>
> Geert-Jan
>
> 2010/5/27 Lukas Kahwe Smith 
>
> >
> > On 27.05.2010, at 23:32, Geert-Jan Brits wrote:
> >
> > > Something like sliders perhaps?
> > > Of course only numerical ranges can be put into sliders. (or a concept
> > that
> > > may be logically presented as some sort of ordening, such as "bad, hmm,
> > > good, great"
> > >
> > > Use Solr's Statscomponent to show the min and max values
> > >
> > > Have a look at tripadvisor.com for good uses/implementation of sliders
> > > (price, and reviewscore are presented as sliders)
> > > my 2c: try to make the possible input values discrete (like at
> > tripadvisor)
> > > which gives a better user experience and limits the potential nr of
> > queries
> > > (cache-wise advantage)
> >
> >
> > yeah i have been pondering something similar. but i now realized that
> this
> > way the user doesnt get an overview of the distribution without actually
> > applying the filter. that being said, it would be nice to display 3
> numbers
> > with the silders, the count of items that were filtered out on the lower
> and
> > upper boundaries as well as the number of items still left (*).
> >
> > aside from this i just put a little tweak to my facetting online:
> > http://search.un-informed.org/search?q=malaria&tm=any&s=Search
> >
> > if you deselect any of the checkboxes, it updates the counts. however i
> > display both the count without and with those additional checkbox filters
> > applied (actually i only display two numbers of they are not the same):
> > http://screencast.com/t/MWUzYWZkY2Yt
> >
> > regards,
> > Lukas Kahwe Smith
> > m...@pooteeweet.org
> >
> > (*) if anyone has a slider that can do the above i would love to
> integrate
> > that and replace the adoption year checkboxes with that
>

Storing different entities in Solr

2010-05-28 Thread Moazzam Khan

Hi Guys,

Is there a way to store 2 types of things in Solr. We have a list of
consultants and a list of consultation requests. and I want to store
them as separate documents. Can I do this with one instance of Solr or
do I have to have two instances?

Thanks,

MOazzam

RE: Storing different entities in Solr

2010-05-28 Thread Nagelberg, Kallin

Good read here: http://mysolr.com/tips/denormalized-data-structure/ .

Are consultation requests unique to each consultant? In that case you could 
represent the request as a Json String and store it as a multi-valued string 
field for each consultant, though that makes querying against requests 
trickier. If you need to search against specific fields in the consultant 
requests than you could try a schema where the consultant is your primary 
entity and have fields like

consultantrequests-field1,
consultantrequests-field2,
consultantrequests-field3

and then one
consultantrequests-fulljson

all multi-valued. You could query against the specific fields, then associate 
to the whole request by searching the json object. It's an approach I've used 
with success. 

-Kallin Nagelberg

-Original Message-
From: Moazzam Khan [mailto:moazz...@gmail.com] 
Sent: Friday, May 28, 2010 12:17 PM
To: solr-user@lucene.apache.org
Subject: Storing different entities in Solr

Hi Guys,

Is there a way to store 2 types of things in Solr. We have a list of
consultants and a list of consultation requests. and I want to store
them as separate documents. Can I do this with one instance of Solr or
do I have to have two instances?

Thanks,

MOazzam

Build query programmatically with lucene, but issue to solr?

2010-05-28 Thread Phillip Rhodes

Hi.
I am building up a query with quite a bit of logic such as parentheses, plus
signs, etc... and it's a little tedious dealing with it all at a string
level.  I was wondering if anyone has any thoughts on constructing the query
in lucene and using the string representation of the query to send to solr.

Thanks,
Phillip

Re: Storing different entities in Solr

2010-05-28 Thread Robert Zotter


Sounds like you'll want to use a multiple core setup. One core fore each type
of "document"

http://wiki.apache.org/solr/CoreAdmin
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Storing-different-entities-in-Solr-tp852299p852346.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Storing different entities in Solr

2010-05-28 Thread Nagelberg, Kallin

Multi-core is an option, but keep in mind if you go that route you will need to 
do two searches to correlate data between the two. 

-Kallin Nagelberg

-Original Message-
From: Robert Zotter [mailto:robertzot...@gmail.com] 
Sent: Friday, May 28, 2010 12:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Storing different entities in Solr

Sounds like you'll want to use a multiple core setup. One core fore each type
of "document"

http://wiki.apache.org/solr/CoreAdmin
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Storing-different-entities-in-Solr-tp852299p852346.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Build query programmatically with lucene, but issue to solr?

2010-05-28 Thread Sven Maurmann


Hi Pillip,

could you give me some more information of your environment? A first idea
that comes to my mind is to use the SearchComponents for the solution of
your problem. You could either replace the whole QueryComponent (not re-
commended) or write a (probably small) SearchComponent that creates the
Lucene query and puts it into the appropriate place in the ResponseBuilder.
If you add such a component to "first-components" in your 
handler-definition,

you will execute the query.

Regards,

Sven

--On Freitag, 28. Mai 2010 12:23 -0400 Phillip Rhodes 
 wrote:



Hi.
I am building up a query with quite a bit of logic such as parentheses,
plus signs, etc... and it's a little tedious dealing with it all at a
string level.  I was wondering if anyone has any thoughts on constructing
the query in lucene and using the string representation of the query to
send to solr.

Thanks,
Phillip

Re: Build query programmatically with lucene, but issue to solr?

2010-05-28 Thread Ryan McKinley

Interesting -- I don't think there is anything that does this.

Though it seems like something the XML Query syntax should be able to
do, but we would still need to add the ability to send the xml style
query to solr.

On Fri, May 28, 2010 at 12:23 PM, Phillip Rhodes
 wrote:
> Hi.
> I am building up a query with quite a bit of logic such as parentheses, plus
> signs, etc... and it's a little tedious dealing with it all at a string
> level.  I was wondering if anyone has any thoughts on constructing the query
> in lucene and using the string representation of the query to send to solr.
>
> Thanks,
> Phillip
>

Re: Storing different entities in Solr

2010-05-28 Thread Bill Au

You can keep different type of documents in the same index.  If each
document has a type field.  You can restrict your searches to specific
type(s) of document by using a filter query, which is very fast and
efficient.

Bill

On Fri, May 28, 2010 at 12:28 PM, Nagelberg, Kallin <
knagelb...@globeandmail.com> wrote:

> Multi-core is an option, but keep in mind if you go that route you will
> need to do two searches to correlate data between the two.
>
> -Kallin Nagelberg
>
> -Original Message-
> From: Robert Zotter [mailto:robertzot...@gmail.com]
> Sent: Friday, May 28, 2010 12:26 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Storing different entities in Solr
>
>
> Sounds like you'll want to use a multiple core setup. One core fore each
> type
> of "document"
>
> http://wiki.apache.org/solr/CoreAdmin
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Storing-different-entities-in-Solr-tp852299p852346.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

SolrJ Unicode problem

2010-05-28 Thread Hugh Cayless

Hi, I'm a solr newbie, and I'm hoping someone can point me in the right 
direction.

I'm trying to index a bunch of documents with Greek text in them.  I can 
successfully index documents by generating add xml and using curl to send them 
to my server, but when I use solrj to create and send documents, the encoding 
gets throughly messed up.

Instead of the result (from an add doc posted with curl):


  
c.etiq.mom;;2077
Της Βησο ς Χρη εις Πανοπολίτης
  


I get (from a SolrInputDocument loaded with solrj):

 
  
  c.etiq.mom;;2077 
  ???  ? ??? ??? �?? 
  


I can confirm that the SolrInputDocument's transcription field contains Greek 
text before I call .add(documents) on the StreamingUpdateSolrServer (i.e., I 
can get Greek back out of it).  So I don't know what to do next.  Any ideas?

Thanks,
Hugh

Re: Build query programmatically with lucene, but issue to solr?

2010-05-28 Thread Ken Krugler



On May 28, 2010, at 9:23am, Phillip Rhodes wrote:


Hi.
I am building up a query with quite a bit of logic such as  
parentheses, plus
signs, etc... and it's a little tedious dealing with it all at a  
string
level.  I was wondering if anyone has any thoughts on constructing  
the query
in lucene and using the string representation of the query to send  
to solr.


Depending on complexity, SolrJ could be a solution.

See the section that talks about "SolrJ provides a APIs to create  
queries instead of hand coding the query..." on http://wiki.apache.org/solr/Solrj


-- Ken


Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g

Re: Sort by function workaround for Solr 1.4

2010-05-28 Thread Blargy


How would this be any different than simply using the function to alter the
scoring of the final results and then sorting by score?


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Sort-by-function-workaround-for-Solr-1-4-tp851922p852471.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Storing different entities in Solr

2010-05-28 Thread Moazzam Khan

Thanks for all your answers guys. Requests and consultants have a many
to many relationship so I can't store request info in a document with
advisorID as the primary key.

Bill's solution and multicore solutions might be what I am looking
for. Bill, will I be able to have 2 primary keys (so I can update and
delete documents)? If yes, can you please give me a link or someting
where I can get more info on this?

Thanks,
Moazzam



On Fri, May 28, 2010 at 11:50 AM, Bill Au  wrote:
> You can keep different type of documents in the same index.  If each
> document has a type field.  You can restrict your searches to specific
> type(s) of document by using a filter query, which is very fast and
> efficient.
>
> Bill
>
> On Fri, May 28, 2010 at 12:28 PM, Nagelberg, Kallin <
> knagelb...@globeandmail.com> wrote:
>
>> Multi-core is an option, but keep in mind if you go that route you will
>> need to do two searches to correlate data between the two.
>>
>> -Kallin Nagelberg
>>
>> -Original Message-
>> From: Robert Zotter [mailto:robertzot...@gmail.com]
>> Sent: Friday, May 28, 2010 12:26 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Storing different entities in Solr
>>
>>
>> Sounds like you'll want to use a multiple core setup. One core fore each
>> type
>> of "document"
>>
>> http://wiki.apache.org/solr/CoreAdmin
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Storing-different-entities-in-Solr-tp852299p852346.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>

Re: Sort by function workaround for Solr 1.4

2010-05-28 Thread Martynas Miliauskas

The problem with using query functions is that I don't know how to equally
scale similarity score and output of the function. What I mean is that query
output would be in the range from 0..1 and function output would be in the
range from 0..1. Well I think I do know how to scale function output to that
range but what about similarity output then?

On Fri, May 28, 2010 at 6:02 PM, Blargy  wrote:

>
> How would this be any different than simply using the function to alter the
> scoring of the final results and then sorting by score?
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Sort-by-function-workaround-for-Solr-1-4-tp851922p852471.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Storing different entities in Solr

2010-05-28 Thread David Stuart


Hi,

So for your use case are you wanting to search for a consultant then  
look at all of his or her request or pull both at the same time? In  
both cases one index should suffice. In you define a primary key field  
and use it for both doc types it shouldn't be an issue. Unless your  
dataset in very large it would reduce the overhead of running a  
multicore solution especially in indexing etc


David Stuart

On 28 May 2010, at 18:12, Moazzam Khan  wrote:


Thanks for all your answers guys. Requests and consultants have a many
to many relationship so I can't store request info in a document with
advisorID as the primary key.

Bill's solution and multicore solutions might be what I am looking
for. Bill, will I be able to have 2 primary keys (so I can update and
delete documents)? If yes, can you please give me a link or someting
where I can get more info on this?

Thanks,
Moazzam



On Fri, May 28, 2010 at 11:50 AM, Bill Au  wrote:

You can keep different type of documents in the same index.  If each
document has a type field.  You can restrict your searches to  
specific

type(s) of document by using a filter query, which is very fast and
efficient.

Bill

On Fri, May 28, 2010 at 12:28 PM, Nagelberg, Kallin <
knagelb...@globeandmail.com> wrote:

Multi-core is an option, but keep in mind if you go that route you  
will

need to do two searches to correlate data between the two.

-Kallin Nagelberg

-Original Message-
From: Robert Zotter [mailto:robertzot...@gmail.com]
Sent: Friday, May 28, 2010 12:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Storing different entities in Solr


Sounds like you'll want to use a multiple core setup. One core  
fore each

type
of "document"

http://wiki.apache.org/solr/CoreAdmin
--
View this message in context:
http://lucene.472066.n3.nabble.com/Storing-different-entities-in-Solr-tp852299p852346.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Prefix-Search with Stopwords - no results?

2010-05-28 Thread Erick Erickson

Hmmm, I don't really see the problem here. I'll have to use English
examples...

Searching on the* (assuming the is a stopword) will search on
(them OR theory OR thespian) assuming those three words are in
your index. It will NOT search on the. So I think you're OK, or are
you seeing anomalous results?

Conceptually, the underlying lucene looks through your *existing* list of
terms for the field to assemble a clause containing the OR of all the
terms that match the wildcard. Since "the" isn't in the index, it doesn't
get included.

HTH
Erick

On Fri, May 28, 2010 at 11:25 AM, Gert Brinkmann  wrote:

>
> Hello,
>
> I am having some problems with solr 1.4. I am indexing and querying data
> using the following fieldType:
>
> > positionIncrementGap="100">
>>  
>>
>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>>
>>>ignoreCase="true"
>>words="stopwords_de_de.txt"
>>enablePositionIncrements="true"
>>/>
>>
>>> />
>>
>>  
>>  
>>
>>> synonyms="synonyms_de_de.txt" ignoreCase="true" expand="true"/>
>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>>
>>>  ignoreCase="true"
>>  words="stopwords_de_de.txt"
>>enablePositionIncrements="true"
>>  />
>>
>>> />
>>
>>  
>>
>>
>
> The application that is using solr does prepare the search string to filter
> out some dangerous characters like brackets and wildcards, etc, that
> otherwise might lead to a wrong query syntax.
>
> All words are searched for as a normal word as well as a prefix. E.g.: "für
> solr" is converted by the application to
>  (für OR für*) AND (solr OR solr*)
>
> This works fine for normal words. But if you have a stopword like "für" in
> this example, the query will be stopword filtered by solr to something like
> this:
>  (für*) AND (solr OR solr*)
>
> The problem now is (as I think) that there is no "für*" anymore in the
> indexed data, because it was stopword filtered, too. If now someone
> copy&pastes a sentence from an indexed document that contains a stopword,
> this document will not be found by solr.
>
> The enablePositionIncrements="true" only is (AFAIU) for querying phrases,
> but not for my case of "word OR word*" queries.
>
> So, what should I do? Is there a better filter combination that I could
> try? Or am I doing something wrong conceptually? The only solution that I
> have found working is to not use stopword filtering at all.
>
> Greetings,
> Gert
>
>

Re: Changing schema without having to reindex

2010-05-28 Thread Erick Erickson

No. You can add new documents which will reflect the new schema, but
you can't retroactively update your index.

In your specific example, it's not possible to losslessly recreate the data
to store from the indexed fields. Consider stopword removal, or lowercasing.

HTH
Erick

On Fri, May 28, 2010 at 11:56 AM, David  wrote:

> Hi,
>
> Can anyone tell me if it is possible to change the schema without having to
> reindex? I want to change the stored fields specifically.  Any help would be
> appreciated, thanks.
>
>

Re: AW: XSLT for JSON


: ok,but is there an easy way to influence the format of json output?
: eg field order, names etc. maybe i want to group the result differently or 
add some infos

Wouldn't that be easier to do on the cliend side once you have the json 
structure?

if not: then using the velocity writer to generate custom JSON is probably 
your best bet -- but the recomendation to use hte JSON response writer 
seemed pretty straight forward based on your initial request...

: > > the new JSON format should exactly the same like the 
: > standard response ... 

...perhaps we just aren't understanding what you mean. giving us a 
specific example of what you want would be helpful

wild shot in the dark: is the problem perchance that you aren't aware of 
the "json.nl" param?

http://wiki.apache.org/solr/SolJSON#JSON_specific_parameters


-Hoss

Re: Storing different entities in Solr

2010-05-28 Thread Erick Erickson

You most certainly *can* store the many<->many relationship, you
are just denormalizing your data. I know it goes against the grain
of any good database admin, but it's very often a good solution
for a search application.

You've gotta forget almost everything you learned about how data
*should* be stored in databases when working with a search app.
Well, perhaps I'm overstating a bit, but you get the idea

When I see messages about primary keys and foreign keys etc, I
break out in hives. It's almost always a mistake to try to force
lucene/solr to behave like a database. Whenever you find yourself
trying, stop, take a deep breath, and think about searching ...

A lot depends on how much data we're talking about here. If
fully denormalizing things would cost you 10M, who cares? If it
would cost you 100G, it's a different story

Best
Erick

On Fri, May 28, 2010 at 1:12 PM, Moazzam Khan  wrote:

> Thanks for all your answers guys. Requests and consultants have a many
> to many relationship so I can't store request info in a document with
> advisorID as the primary key.
>
> Bill's solution and multicore solutions might be what I am looking
> for. Bill, will I be able to have 2 primary keys (so I can update and
> delete documents)? If yes, can you please give me a link or someting
> where I can get more info on this?
>
> Thanks,
> Moazzam
>
>
>
> On Fri, May 28, 2010 at 11:50 AM, Bill Au  wrote:
> > You can keep different type of documents in the same index.  If each
> > document has a type field.  You can restrict your searches to specific
> > type(s) of document by using a filter query, which is very fast and
> > efficient.
> >
> > Bill
> >
> > On Fri, May 28, 2010 at 12:28 PM, Nagelberg, Kallin <
> > knagelb...@globeandmail.com> wrote:
> >
> >> Multi-core is an option, but keep in mind if you go that route you will
> >> need to do two searches to correlate data between the two.
> >>
> >> -Kallin Nagelberg
> >>
> >> -Original Message-
> >> From: Robert Zotter [mailto:robertzot...@gmail.com]
> >> Sent: Friday, May 28, 2010 12:26 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Storing different entities in Solr
> >>
> >>
> >> Sounds like you'll want to use a multiple core setup. One core fore each
> >> type
> >> of "document"
> >>
> >> http://wiki.apache.org/solr/CoreAdmin
> >> --
> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/Storing-different-entities-in-Solr-tp852299p852346.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >
>

RE: Storing different entities in Solr

2010-05-28 Thread Nagelberg, Kallin

I agree with Erick,

Could you show us what these two entities look like, and the total count of 
each? That might shed some light on the appropriate approach.

-Kallin Nagelberg

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, May 28, 2010 2:36 PM
To: solr-user@lucene.apache.org
Subject: Re: Storing different entities in Solr

You most certainly *can* store the many<->many relationship, you
are just denormalizing your data. I know it goes against the grain
of any good database admin, but it's very often a good solution
for a search application.

You've gotta forget almost everything you learned about how data
*should* be stored in databases when working with a search app.
Well, perhaps I'm overstating a bit, but you get the idea

When I see messages about primary keys and foreign keys etc, I
break out in hives. It's almost always a mistake to try to force
lucene/solr to behave like a database. Whenever you find yourself
trying, stop, take a deep breath, and think about searching ...

A lot depends on how much data we're talking about here. If
fully denormalizing things would cost you 10M, who cares? If it
would cost you 100G, it's a different story

Best
Erick

On Fri, May 28, 2010 at 1:12 PM, Moazzam Khan  wrote:

> Thanks for all your answers guys. Requests and consultants have a many
> to many relationship so I can't store request info in a document with
> advisorID as the primary key.
>
> Bill's solution and multicore solutions might be what I am looking
> for. Bill, will I be able to have 2 primary keys (so I can update and
> delete documents)? If yes, can you please give me a link or someting
> where I can get more info on this?
>
> Thanks,
> Moazzam
>
>
>
> On Fri, May 28, 2010 at 11:50 AM, Bill Au  wrote:
> > You can keep different type of documents in the same index.  If each
> > document has a type field.  You can restrict your searches to specific
> > type(s) of document by using a filter query, which is very fast and
> > efficient.
> >
> > Bill
> >
> > On Fri, May 28, 2010 at 12:28 PM, Nagelberg, Kallin <
> > knagelb...@globeandmail.com> wrote:
> >
> >> Multi-core is an option, but keep in mind if you go that route you will
> >> need to do two searches to correlate data between the two.
> >>
> >> -Kallin Nagelberg
> >>
> >> -Original Message-
> >> From: Robert Zotter [mailto:robertzot...@gmail.com]
> >> Sent: Friday, May 28, 2010 12:26 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Storing different entities in Solr
> >>
> >>
> >> Sounds like you'll want to use a multiple core setup. One core fore each
> >> type
> >> of "document"
> >>
> >> http://wiki.apache.org/solr/CoreAdmin
> >> --
> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/Storing-different-entities-in-Solr-tp852299p852346.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >
>

Re: NoSuchFieldError: submap


: Hi, I'm trying to build from source to apply the field collapsing patch.
: 'Ant dist' runs just fine, no errors, but at startup I get a
: "NoSuchFieldError: submap" exception (stack trace:
: http://pastebin.com/NXsf0KJS ). This is before sending any requests. I don't
: have any 'submap' field defined anywhere.
: Has anyone seen this? Any ideas?

the "field" in question isn't refering to a field in your index -- it's a 
java error refering to a field of a java class.

in a nutshell: some class file you are using at runtime is inconsistent 
with a class file thta you used at compile time.  the "submap" field of 
some object is manipulated on line 89 of your SynonymFilter.java file, but 
that object doesn't have a "submap" field.

typically this type of problem happens when you doing have a clean 
classpath: older versions of jars are included as well, or the jars  you 
compiled against aren't included but other differnet jars with the same 
classes in them are.



-Hoss

Rebuild an index

2010-05-28 Thread Sai . Thumuluri

Hi, 
We use Drupal as the CMS and Solr for our search engine needs and are
planning to have Solr Master-Slave replication setup across the data
centers. I am in the process of testing my replication - what is the
best means to delete the index on the Solr slave and then replicate a
fresh copy from Master?  We use Solr 1.3.

Thanks,
Sai Thumuluri

My Master solrconfig.xml is 

  

  startup
  commit
  commit
  schema.xml,synonyms.txt,stopwords.txt,elevate.xml

  

And my slave solrconfig.xml


  

  http://masterURL:8080/solr/replication
  01:00:00

Re: Changing schema without having to reindex


: No. You can add new documents which will reflect the new schema, but
: you can't retroactively update your index.
: 
: In your specific example, it's not possible to losslessly recreate the data
: to store from the indexed fields. Consider stopword removal, or lowercasing.

To put it another way: there are in infinite number of changes you can 
make to the schema.xml which do not *require* reindexing -- but that 
doens't mean that things will work the way you want them to.  You can add 
(or remove) stored fields, but that won't magically add new data data (or 
remove to existing data) from any existing documents.



-Hoss

RE: SolrJ Unicode problem

2010-05-28 Thread Tim Gilbert

I had a similar problem a few days ago and I found that the documents where not 
being loaded correctly as UTF-8 into Solr.  In my case, the loader program was 
a Java.jar I was executing from a cron job.  There I added this:

java -Dfile.encoding=UTF-8 -jar /home/tim/solr/bin/loadSiteSearch.jar

Then, within that program, I wrote function to take the strings I was loading 
and expressly declare them as UTF-8 like this:

private String toUTF8(String value)
{
return new String(value.getBytes(), "UTF-8");
}

and that solved the problem for me.

Tim

-Original Message-
From: Hugh Cayless [mailto:philomou...@gmail.com] 
Sent: Friday, May 28, 2010 12:51 PM
To: solr-user@lucene.apache.org
Subject: SolrJ Unicode problem

Hi, I'm a solr newbie, and I'm hoping someone can point me in the right 
direction.

I'm trying to index a bunch of documents with Greek text in them.  I can 
successfully index documents by generating add xml and using curl to send them 
to my server, but when I use solrj to create and send documents, the encoding 
gets throughly messed up.


Instead of the result (from an add doc posted with curl):


  
c.etiq.mom;;2077
Της Βησο ς Χρη εις Πανοπολίτης
  


I get (from a SolrInputDocument loaded with solrj):

 
  
  c.etiq.mom;;2077 
  ???  ? ??? ??? �?? 
  


I can confirm that the SolrInputDocument's transcription field contains Greek 
text before I call .add(documents) on the StreamingUpdateSolrServer (i.e., I 
can get Greek back out of it).  So I don't know what to do next.  Any ideas?

Thanks,
Hugh

Re: SolrJ Unicode problem

2010-05-28 Thread Hugh Cayless

Yeah, I just figured out that if I set

export JAVA_TOOL_OPTIONS="-Dfile.encoding=UTF8"

Everything works.  The OutputStreamWriter used by StreamingUpdateSolrServer 
uses the default encoding.  UTF-8 might be better, but maybe there are reasons 
not to hard-code it.

Thanks,
Hugh

On May 28, 2010, at 3:02 PM, Tim Gilbert wrote:

> I had a similar problem a few days ago and I found that the documents where 
> not being loaded correctly as UTF-8 into Solr.  In my case, the loader 
> program was a Java.jar I was executing from a cron job.  There I added this:
> 
> java -Dfile.encoding=UTF-8 -jar /home/tim/solr/bin/loadSiteSearch.jar
> 
> Then, within that program, I wrote function to take the strings I was loading 
> and expressly declare them as UTF-8 like this:
> 
> private String toUTF8(String value)
> {
>   return new String(value.getBytes(), "UTF-8");
> }
> 
> and that solved the problem for me.
> 
> Tim
> 
> -Original Message-
> From: Hugh Cayless [mailto:philomou...@gmail.com] 
> Sent: Friday, May 28, 2010 12:51 PM
> To: solr-user@lucene.apache.org
> Subject: SolrJ Unicode problem
> 
> Hi, I'm a solr newbie, and I'm hoping someone can point me in the right 
> direction.
> 
> I'm trying to index a bunch of documents with Greek text in them.  I can 
> successfully index documents by generating add xml and using curl to send 
> them to my server, but when I use solrj to create and send documents, the 
> encoding gets throughly messed up.
> 
> 
> Instead of the result (from an add doc posted with curl):
> 
> 
>  
>c.etiq.mom;;2077
>Της Βησο ς Χρη εις Πανοπολίτης
>  
> 
> 
> I get (from a SolrInputDocument loaded with solrj):
> 
>  
>  
>  c.etiq.mom;;2077 
>  ???  ? ??? ??? �?? 
>  
> 
> 
> I can confirm that the SolrInputDocument's transcription field contains Greek 
> text before I call .add(documents) on the StreamingUpdateSolrServer (i.e., I 
> can get Greek back out of it).  So I don't know what to do next.  Any ideas?
> 
> Thanks,
> Hugh

Re: Sites with Innovative Presentation of Tags and Facets


: Perhaps you could show the 'nr of items left' as a tooltip of sorts when the
: user actually drags the slider.

Years ago, when we were first working on building Solr, a coworker of mind 
suggested using double bar sliders (ie: pick a range using a min and a 
max) for all numeric facets and putting "sparklines" above them to give 
the user a visual indication of the "spread" of documents across the 
numeric spectrum.

it wsa a little more complicated then anything we needed -- and seemed 
like a real pain in hte ass to implement.  i still don't know of anyone 
doing anything like that, but it's definitley an interesting idea.

The hard part is really just deciding what "quantum" interval you want 
to use along the xaxis to decide how to count the docs for the y axis.

http://en.wikipedia.org/wiki/Sparkline
http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR


-Hoss

Re: Sites with Innovative Presentation of Tags and Facets

2010-05-28 Thread Lukas Kahwe Smith


On 28.05.2010, at 21:31, Chris Hostetter wrote:

> 
> : Perhaps you could show the 'nr of items left' as a tooltip of sorts when the
> : user actually drags the slider.
> 
> Years ago, when we were first working on building Solr, a coworker of mind 
> suggested using double bar sliders (ie: pick a range using a min and a 
> max) for all numeric facets and putting "sparklines" above them to give 
> the user a visual indication of the "spread" of documents across the 
> numeric spectrum.
> 
> it wsa a little more complicated then anything we needed -- and seemed 
> like a real pain in hte ass to implement.  i still don't know of anyone 
> doing anything like that, but it's definitley an interesting idea.
> 
> The hard part is really just deciding what "quantum" interval you want 
> to use along the xaxis to decide how to count the docs for the y axis.
> 
> http://en.wikipedia.org/wiki/Sparkline
> http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR


kayak.com uses a double slider to handle the flight departure range:
http://screencast.com/t/ZjExMTE5

regards,
Lukas Kahwe Smith
m...@pooteeweet.org

Re: Does SOLR Allow q= (A or B) AND (C or D)?

2010-05-28 Thread Ahmet Arslan

--- On Fri, 5/28/10, efr...@gmail.com  wrote:

> From: efr...@gmail.com 
> Subject: Re: Does SOLR Allow q= (A or B) AND (C or D)?
> To: solr-user@lucene.apache.org
> Date: Friday, May 28, 2010, 4:42 AM
> Hi Ahmet,
> 
> Thanks again for the feedback. We will be searching several
> fields of each
> object in the index (title, description, tags). The matches
> on keywords need
> to be in any of these fields and there will be no different
> weights.

Okey after investigating your example website, i think i understand you now. As 
a suggestion you are going to display documents, not a particular field, or 
keywords. You can do it with my solution. Create two additional fields with the 
types i wrote. Copy your title, desc and tag fields into these fields. Execute 
the query - on these two field - as the user types. You will see the same 
results with your example website. Just don't forget to use quotes for the 
field that uses keyword tokenizer. And the () for the other one. Also default 
operator OR is required. When the user is typing the query tap water, your 
query will be
q=f1:"tap wate" f2:(tap wate)   just before the last key stroke.

Re: 'Minimum Should Match' on subquery level


: I need to use Lucene's  `minimum number should match` option of BooleanQuery
: on Solr.

unfortunately, the Lucene QueryParser doesn't support any way of 
manipulating the minNumberSHouldMatch property of BooleanQueries specified 
in that syntax.

I'm not sure of anyway to do what you're looking for w/o some custom code 
(either customing the QUeryParser, or writing a QParser that modifies the 
BooleanQueries produced)








-Hoss

Re: Full Import failed


: I am just using the sor.war file that came with the Solr 1.4 download on
: weblogic.
: did not add any jar or remove any jar

i'm not sure what to tell you then -- somehow you have another copy of 
some classes in your classpath.

did you copy the solr.war on top of an older solr.war?  is it possible 
there is an older version of solr that weblogic un-wared into a working 
directory and that older jars existing in the directory along with the new 
jars?

: On Tue, May 25, 2010 at 9:54 PM, Chris Hostetter
: wrote:
: 
: >
: > : yes i am running 1.5, Any idea how we can run Solr 1.4 using Java 1.5
: >
: > Solr 1.4 works just fine with Java 1.5 -- even when Using the
: > DataImportHandler.
: >
: > there are some features of DIH like the  ScriptTransformer that requires
: > java 1.6, but that's not your issue...
: >
: > : > Last I encountered that exception was with the usage of String.isEmpty
: > : > which is a 1.6 novelty.
: >
: > ...the line in question in the stack trace provided has nothign to do with
: > String.isEmpty.
: >
: > >> Caused by: java.lang.NoSuchMethodError: isEmpty
: > >> at
: > 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:391)
: >
: > the object in question is a DocWrapper which inherits from
: > SolrInputDocument which defines isEmpty.  if you are getting this error it
: > suggests that something is wonky with your classpath, and you probably
: > have multiple versions of some solr jars getting included by mistake -- in
: > particular an old copy of the solr-common jar where SolrInputDocument is
: > defined.
: >
: >
: >
: > -Hoss
: >
: >
: 



-Hoss

Re: Sites with Innovative Presentation of Tags and Facets


: > Years ago, when we were first working on building Solr, a coworker of mind 
: > suggested using double bar sliders (ie: pick a range using a min and a 
: > max) for all numeric facets and putting "sparklines" above them to give 
: > the user a visual indication of the "spread" of documents across the 
: > numeric spectrum.
...
: > http://en.wikipedia.org/wiki/Sparkline
: > http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR


: kayak.com uses a double slider to handle the flight departure range:
: http://screencast.com/t/ZjExMTE5

Well, sure ... double bar sliders aren't relaly novel at all -- my point 
was the idea of putting a sparkline above hte slider, so people had a 
visual indicator of how many results they would get by adjusting the bars 
to various poits, before they ever even touched it (as opposed to a 
tooltip)


-Hoss

Re: SolrJ Unicode problem


: Everything works.  The OutputStreamWriter used by 
: StreamingUpdateSolrServer uses the default encoding.  UTF-8 might be 
: better, but maybe there are reasons not to hard-code it.

no, this was a big thta's been fixed in svn...

https://issues.apache.org/jira/browse/SOLR-1595

...it would probably be useful to allow the client to specify the encoding 
in some way, but the Content-Type set by SOlrJ has UTF-8 hardcoded in it, so 
either way it was a bug.



-Hoss

Re: Prefix-Search with Stopwords - no results?


: Searching on the* (assuming the is a stopword) will search on
: (them OR theory OR thespian) assuming those three words are in
: your index. It will NOT search on the. So I think you're OK, or are
: you seeing anomalous results?

i think the missing pieces to hte puzzle here are:

1) wildcard and prefix queries aren't analyzed, so "the*" (or "für*") 
doesnt' get analyzed, and the system has no way of spoting that it's a 
stopword that should be removed from the query -- nor should it in general 
since the fact that "the" is a stpword doens't mean "the*" is an invalid 
query.  I could very concievabley be trying to find words like "thespian"

2) by using the "AND" operator you are forcing both clauses to match...

: >  (für*) AND (solr OR solr*)

...so that query will only turn up results if a document containing a word 
that starts with "solr" and a word that starts with "für" existing in your 
index.

: > The problem now is (as I think) that there is no "für*" anymore in the
: > indexed data, because it was stopword filtered, too. If now someone

the _word* "für" doesn't exist in your index because it's a stopword, but 
there may be other words in your index starting with the prefix "für" -- 
and if those words appear in documents that also contain words starting 
with "solr" then you will actually get matches.

: > So, what should I do? Is there a better filter combination that I could
: > try? Or am I doing something wrong conceptually? The only solution that I
: > have found working is to not use stopword filtering at all.


I would suggest that intstead of your existing approach of taking "word1 
word2 word3 ..." and converting it to "(word1 OR word1*) AND (word2 OR 
word2*) ..." in the client, that you instead consider using multiple 
fields -- one "text" defined as you have it now, and one "text_prefix" 
defined similarly but with an additional EdgeNGramTokenFilter used when 
indexing to generate "prefix" tokens. then search those fields using 
dismax...

q=word1 word2 word3 & qf=text text_prefix & mm=100% & tie=0



-Hoss

Re: solr.solr.home


: Hi to everyone, I'm really sorry for the s3tupid question I'm doing, but I
: didn't understand how to set the java system property solr.solr.home to my
: solr home.

first off: you don't neccessarily *have* to set the solr.solr.home system 
property -- there are two other ways of telling Solr where to find it's 
"home" directory that also work (using ./solr and using JNDI)...

   http://wiki.apache.org/solr/SolrInstall#Setup

Second: how to set the system property (or how to use JNDI) to tell solr 
where i'ts home dir is largely depends on how you are running solr -- 
in particular, what servlet container and how it's being started.

are you using Jetty? Tomcat? Resin? Websphere? etc...

if you give us more details about what you are doing (or have already 
done) we can better understand how to help you.





-Hoss

Re: Sites with Innovative Presentation of Tags and Facets

2010-05-28 Thread Lukáš Vlček

On Fri, May 28, 2010 at 9:49 PM, Chris Hostetter
wrote:

>
> : > Years ago, when we were first working on building Solr, a coworker of
> mind
> : > suggested using double bar sliders (ie: pick a range using a min and a
> : > max) for all numeric facets and putting "sparklines" above them to give
> : > the user a visual indication of the "spread" of documents across the
> : > numeric spectrum.
> ...
> : > http://en.wikipedia.org/wiki/Sparkline
> : > http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR
>
>
> : kayak.com uses a double slider to handle the flight departure range:
> : http://screencast.com/t/ZjExMTE5
>
> Well, sure ... double bar sliders aren't relaly novel at all -- my point
> was the idea of putting a sparkline above hte slider, so people had a
> visual indicator of how many results they would get by adjusting the bars
> to various poits, before they ever even touched it (as opposed to a
> tooltip)
>
>
> -Hoss
>
>
Hoss,

you mean something like the following?
http://hledani.rozhlas.cz/?query=jazz&back=&defaultNavigation=&;

(Sorry, it is in Czech language but the web ui is pretty straightforward)

Regards,
Lukas

Re: Sites with Innovative Presentation of Tags and Facets

2010-05-28 Thread Lukáš Vlček

On Fri, May 28, 2010 at 11:39 PM, Lukáš Vlček  wrote:

>
>
> On Fri, May 28, 2010 at 9:49 PM, Chris Hostetter  > wrote:
>
>>
>> : > Years ago, when we were first working on building Solr, a coworker of
>> mind
>> : > suggested using double bar sliders (ie: pick a range using a min and a
>> : > max) for all numeric facets and putting "sparklines" above them to
>> give
>> : > the user a visual indication of the "spread" of documents across the
>> : > numeric spectrum.
>> ...
>> : > http://en.wikipedia.org/wiki/Sparkline
>> : > http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR
>>
>>
>> : kayak.com uses a double slider to handle the flight departure range:
>> : http://screencast.com/t/ZjExMTE5
>>
>> Well, sure ... double bar sliders aren't relaly novel at all -- my point
>> was the idea of putting a sparkline above hte slider, so people had a
>> visual indicator of how many results they would get by adjusting the bars
>> to various poits, before they ever even touched it (as opposed to a
>> tooltip)
>>
>>
>> -Hoss
>>
>>
> Hoss,
>
> you mean something like the following?
> http://hledani.rozhlas.cz/?query=jazz&back=&defaultNavigation=&;
>
> (Sorry, it is in Czech language but the web ui is pretty straightforward)
>
> Regards,
> Lukas
>

Also http://markmail.org has some nice chart

Re: Sites with Innovative Presentation of Tags and Facets


: > you mean something like the following?
: > http://hledani.rozhlas.cz/?query=jazz&back=&defaultNavigation=&;

: Also http://markmail.org has some nice chart

Yeah ... those are close to what i mean -- but in both cases there is 
really one big visual graph of a single numeric value (ironicly it's a 
timeline in both cases) ... i was thinking more along the lines of when a 
facet UI has *multiple* numeric facets.

Imagine if a site like kayak.com for example, that has a search UI with 7 
numeric sliders (departure take off time, departure landing time, return 
take off time, return landing time, layover duration, trip duration, and 
price) showed you a small sparkline above each slider that showed you 
where the various options tended to cluster based on the other filters you 
had applied -- so you can see that most flights have layovers in the ~30 
minute range, and the key price point is around $99 ... but when you move 
the "take off time" slider to early in the morning the sparkline above 
layover duration shifts up to longer layovers, and the prices start 
tnreding up.


-Hoss

Re: Sort by function workaround for Solr 1.4


: I have found a user comment at this page
: https://issues.apache.org/jira/browse/SOLR-1297 (Enable sorting by function
: query) where he has mentioned that there is a workaround "(main query)^0
: func(...)" that can be used to sort results by function without having to
: install "SOLR-1297" patch. Could anyone please explain this workaround a bit
: in more detail? Maybe someone could give me an example of using it?

what that is in refrence to is that instead of using something like...

   q=some query

...you would use...

   q=+(some query) _func_:"aFunction(yourPopularity)" 

Alternately, you may want to use the "boost" QParser which lets you 
multiply the scores from a regular query by the output of a function 
(which is probably more along hte lines of what you want if your 
popularity ranking produces a number in a bounded range from [0-1] ...

   q={!boost b=aFunction(yourPopularity) v=$x} & x=some query 


-Hoss

Re: Help with query boosting syntax


: > When a person searches for keywords eg value1 value2 value3 we want to
: > apply boosting so that a document is boosted according to which of the
: > keywords it has.
: > eg of url : q=value1^4.0 OR value2^2.0 OR value3
...
: > I have a requesthandler set up to just search on our Keyword column which
...
: UPDATE: 
: I have managed to get this syntax working
: 
select/?qt=KeywordSearch&q=greenpoint&bq=Keyword:work^1.5&bq=Keyword:Pigalle^1.2
: But this just seems quite cumbersome having to specify a bq each time. Also
: I cannot get more than one q= to work.

...you haven't told us anything about how your "KeywordSearch" handler is
declared, but presumably if the "bq" param is working for you you are 
using either the dismax handler or the dismax QParser.

dismax doesn't support the "^2.3" type boost syntax in the query param -- 
it supports it in the field param.  the theory behind dismax is that the 
"user" should just provide "words", while the "admin" who configures it 
should know that certain *fields* should be boosted (not words)

if you have an external system that already knows that valu1 should get a 
boost of 4.0, and value2 should get a boost of 2.0 then there is really no 
reason to be using dismax -- just use the "lucene" QParser (which is the 
default for SearchHandler) and set the default field to be your keyword 
field...

  /select? df=Keyword & q=value1^4.0 value2^2.0 value3



-Hoss

Re: Does SOLR Allow q= (A or B) AND (C or D)?

2010-05-28 Thread efr...@gmail.com

Hi Ahmad,

Thanks for this. So do we need this:"*&defType=lucene&q.op=OR&fl=Title* at
the end?

Also, I'm guessing we will need to install EdgeNGramFilterFactory?

Here are the analyzers / filters we currently are using (just the default
stuff):


*Index Analyzer: *org.apache.solr.analysis.TokenizerChain
Details

*Tokenizer Class: * org.apache.solr.analysis.WhitespaceTokenizerFactory

*Filters: *

   1. org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt
   ignoreCase: true enablePositionIncrements: true }
   2. org.apache.solr.analysis.WordDelimiterFilterFactory
   args:{splitOnCaseChange: 1 generateNumberParts: 1 catenateWords: 1
   generateWordParts: 1 catenateAll: 0 catenateNumbers: 1 }
   3. org.apache.solr.analysis.LowerCaseFilterFactory args:{}
   4. org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected:
   protwords.txt }
   5. org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}


I am not seeing these from your example in a prev email:








Perhaps they were not necessary and just a part of your example? Again,
thanks for your help.

thanks

Brad

On Fri, May 28, 2010 at 3:43 PM, Ahmet Arslan  wrote:

>
>
> --- On Fri, 5/28/10, efr...@gmail.com  wrote:
>
> > From: efr...@gmail.com 
> > Subject: Re: Does SOLR Allow q= (A or B) AND (C or D)?
> > To: solr-user@lucene.apache.org
> > Date: Friday, May 28, 2010, 4:42 AM
> > Hi Ahmet,
> >
> > Thanks again for the feedback. We will be searching several
> > fields of each
> > object in the index (title, description, tags). The matches
> > on keywords need
> > to be in any of these fields and there will be no different
> > weights.
>
> Okey after investigating your example website, i think i understand you
> now. As a suggestion you are going to display documents, not a particular
> field, or keywords. You can do it with my solution. Create two additional
> fields with the types i wrote. Copy your title, desc and tag fields into
> these fields. Execute the query - on these two field - as the user types.
> You will see the same results with your example website. Just don't forget
> to use quotes for the field that uses keyword tokenizer. And the () for the
> other one. Also default operator OR is required. When the user is typing the
> query tap water, your query will be
> q=f1:"tap wate" f2:(tap wate)   just before the last key stroke.
>
>
>
>

Re: Help with PatternReplaceFilterFactory

You probably want to store these as numbers instead of text.  The
DataImportHandler allows you to take apart text blocks and save the
parts in number fields. The text analysis stack inside basic indexing
will not do this.

On Thu, May 27, 2010 at 7:24 AM, Koji Sekiguchi  wrote:
>
>> Yes you are right, I get that type of result. I guess my wording was
>> wrong.
>> My field looks like this in the index:
>> R500,000-550,000 Per Annum
>> R500,000-550,000 Per Annum
>>
>> How would I search for say salaries in the range of 500,000 - 550,000?
>> Trying fq=Rumeration_strip:500,000-550,00 doesn't bring back anything. I
>> must have something wrong.
>>
>>
>
> So you are not asking facet...
>
> For query syntax of range search, take a look at:
>
> http://lucene.apache.org/java/2_9_1/queryparsersyntax.html#Range%20Searches
>
> And you need to index the lower and upper salary to separate fields i.e.
>
> low:50
> up:55
>
> Then you can search the both of the fields e.g.
>
> q=low:[50 TO *] AND up:[* TO 55]
>
> Koji
>
> --
> http://www.rondhuit.com/en/
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Need guidance on schema type

Both use the same HTML stripper. The DIH lets you run multiple
documents in parallel in one request if that helps.

On Thu, May 27, 2010 at 9:32 AM, Blargy  wrote:
>
> There will never be any need to search the actual HTML (tags, markup, etc) so
> as far as functionality goes it seems like the DIH HTMLStripTransformer is
> the way to go.
>
> Are there any significant performance differences between the two?
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Need-guidance-on-schema-type-tp846923p848874.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Lance Norskog
goks...@gmail.com

Re: Solr trunk and Jetty threadpool implementation problem

Please file a JIRA.

On Thu, May 27, 2010 at 2:43 PM, Smiley, David W.  wrote:
> I'd like to warn people about the default configuration of Jetty in the Solr 
> trunk release (not present in Solr 1.4 and prior).  There is a difference in 
> the jetty configuration which is for the latest Solr to use the 
> QueuedThreadPool (as seen in jetty.xml).  Previously, it had used a 
> BoundedThreadPool implementation that I've heard is considered deprecated 
> presently.  I have a multi-core setup where Jetty is serving up lots of Solr 
> cores 9+ and when our client does a distributed search (3 of them at a time 
> actually), it triggers a condition in which the query takes 50 plus seconds 
> to respond.  During this time, the machine is effectively idle, seemingly 
> waiting for something.  To fix this, go back to the former BoundedThreadPool 
> implementation or don't use Jetty.  FWIW this has triggered us to swtich to 
> Tomcat.
>
> Sorry but I have sunk so much resources into tracking down this nasty problem 
> that I can't spend much more on further figuring out why QueuedThreadPool is 
> failing us.
>
> ~ David Smiley
> Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/
>
>
>
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Solr trunk and Jetty threadpool implementation problem

2010-05-28 Thread Yonik Seeley

Wow, thanks for the heads-up David!
This probably got inadvertently changed when Jetty was upgraded...
sounds like we should prob change back to BoundedThreadPool as a
default!

-Yonik
http://www.lucidimagination.com

On Thu, May 27, 2010 at 5:43 PM, Smiley, David W.  wrote:
> I'd like to warn people about the default configuration of Jetty in the Solr 
> trunk release (not present in Solr 1.4 and prior).  There is a difference in 
> the jetty configuration which is for the latest Solr to use the 
> QueuedThreadPool (as seen in jetty.xml).  Previously, it had used a 
> BoundedThreadPool implementation that I've heard is considered deprecated 
> presently.  I have a multi-core setup where Jetty is serving up lots of Solr 
> cores 9+ and when our client does a distributed search (3 of them at a time 
> actually), it triggers a condition in which the query takes 50 plus seconds 
> to respond.  During this time, the machine is effectively idle, seemingly 
> waiting for something.  To fix this, go back to the former BoundedThreadPool 
> implementation or don't use Jetty.  FWIW this has triggered us to swtich to 
> Tomcat.
>
> Sorry but I have sunk so much resources into tracking down this nasty problem 
> that I can't spend much more on further figuring out why QueuedThreadPool is 
> failing us.
>
> ~ David Smiley
> Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/
>
>
>
>
>

Re: Interleaving the results

There is no interleaving tool. There is a random number tool. You will
have to achive this in your application.

On Fri, May 28, 2010 at 8:23 AM, NarasimhaRaju  wrote:
> Hi,
> how to achieve custom ordering of the documents when there is a general query?
>
> Usecase:
> Interleave documents from different customers one after the other.
>
> Example:
> Say i have 10 documents in the index belonging to 3 customers (customer_id 
> field in the index ) and using query *:*
> so all the documents in the results score the same.
> but i want the results to be interleaved
> one document from the each customer should appear before a document from the 
> same customer repeats ?
>
> is there a way to achieve this ?
>
>
> Thanks in advance
>
> R.
>
>
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Storing different entities in Solr

The size of the join table is the number of documents, if you
denormalize the two tables.

On Fri, May 28, 2010 at 11:38 AM, Nagelberg, Kallin
 wrote:
> I agree with Erick,
>
> Could you show us what these two entities look like, and the total count of 
> each? That might shed some light on the appropriate approach.
>
> -Kallin Nagelberg
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Friday, May 28, 2010 2:36 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Storing different entities in Solr
>
> You most certainly *can* store the many<->many relationship, you
> are just denormalizing your data. I know it goes against the grain
> of any good database admin, but it's very often a good solution
> for a search application.
>
> You've gotta forget almost everything you learned about how data
> *should* be stored in databases when working with a search app.
> Well, perhaps I'm overstating a bit, but you get the idea
>
> When I see messages about primary keys and foreign keys etc, I
> break out in hives. It's almost always a mistake to try to force
> lucene/solr to behave like a database. Whenever you find yourself
> trying, stop, take a deep breath, and think about searching ...
>
> A lot depends on how much data we're talking about here. If
> fully denormalizing things would cost you 10M, who cares? If it
> would cost you 100G, it's a different story
>
> Best
> Erick
>
>
> On Fri, May 28, 2010 at 1:12 PM, Moazzam Khan  wrote:
>
>> Thanks for all your answers guys. Requests and consultants have a many
>> to many relationship so I can't store request info in a document with
>> advisorID as the primary key.
>>
>> Bill's solution and multicore solutions might be what I am looking
>> for. Bill, will I be able to have 2 primary keys (so I can update and
>> delete documents)? If yes, can you please give me a link or someting
>> where I can get more info on this?
>>
>> Thanks,
>> Moazzam
>>
>>
>>
>> On Fri, May 28, 2010 at 11:50 AM, Bill Au  wrote:
>> > You can keep different type of documents in the same index.  If each
>> > document has a type field.  You can restrict your searches to specific
>> > type(s) of document by using a filter query, which is very fast and
>> > efficient.
>> >
>> > Bill
>> >
>> > On Fri, May 28, 2010 at 12:28 PM, Nagelberg, Kallin <
>> > knagelb...@globeandmail.com> wrote:
>> >
>> >> Multi-core is an option, but keep in mind if you go that route you will
>> >> need to do two searches to correlate data between the two.
>> >>
>> >> -Kallin Nagelberg
>> >>
>> >> -Original Message-
>> >> From: Robert Zotter [mailto:robertzot...@gmail.com]
>> >> Sent: Friday, May 28, 2010 12:26 PM
>> >> To: solr-user@lucene.apache.org
>> >> Subject: Re: Storing different entities in Solr
>> >>
>> >>
>> >> Sounds like you'll want to use a multiple core setup. One core fore each
>> >> type
>> >> of "document"
>> >>
>> >> http://wiki.apache.org/solr/CoreAdmin
>> >> --
>> >> View this message in context:
>> >>
>> http://lucene.472066.n3.nabble.com/Storing-different-entities-in-Solr-tp852299p852346.html
>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>
>> >
>>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Solr trunk and Jetty threadpool implementation problem