Re: Sum of one field

2008-08-06 Thread Svein Parnas
The easiest solution would probably be to have a facet on the quantity  
field and calculate the total quantity on the client side.


Svein

On 4. aug.. 2008, at 21.47, Otis Gospodnetic wrote:


Leonardo,
You'd have to read that "quantity" fields for all matching documents  
one way or the other.
One way is by getting all results and pulling that field out, so you  
can get the sum..
Another way is to hack the SolrIndexSearcher and get this value in  
one of the HitCollector collect method calls.
Another possibility, if your index is fairly static, might be to  
read it all documents' (not just matches') quantity field and store  
that in a docID->quantity map structure that lets you look up  
quantity for any docID you want.



There may be other/better ways of doing this, but this is what comes  
to (my) mind first.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Leonardo Dias <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Monday, August 4, 2008 1:19:45 PM
Subject: Sum of one field

Everyone exhibits "your search for x has returned y results" on the  
top

of the results page, but we need something else, which would be
something like "your search for x returned y results in z records",
being z the numdocs of the SOLR response and y a SUM(quantity) of all
returned records.

In SQL you can do something like:

SELECT count(1), sum(quantity) FROM table

But with SOLR we don't know how can we do the same without having to
return all the XML result for the field "quantity" and then sum it to
show the total. Any hints on how to do it in a better way?



cheers,

Leonardo






Re: Replacing FAST functionality at sesam.no

2008-08-27 Thread Svein Parnas


On 27. aug.. 2008, at 19.44, Glenn-Erik Sandbakken wrote:


At sesam.no we want to replace a FAST (fast.no) Query Matching Server
with a Solr index.

The index we are trying to replace is not a regular index, but  
specially
configured to perform phrases (and sub-phrases) matches against  
several

large lists (like an index with only a 'title' field).

I'm not sure of a correct, or logical, name for the behavior we are
after, but it is like a combination between Shingles and exact  
matching.


Some examples should explain it well.


In order to do this, you can´t use the ShingleFilter during indexing  
since a document like "one two three" and a query like "one two four"  
will match since they have the shingle "one two" in common.


You will get what you want, I think, if you don´t tokenize during  
indexing (some normalization will be required if your lists aren't  
normalized to begin with) and apply the ShingleFilter only to the  
queries.


Svein



Re: Querying multivalued field - can scoring formula consider only matched values?

2008-10-11 Thread Svein Parnas


On 7. okt.. 2008, at 21.49, abhishek007 wrote:



Hi,
My application needs to handle synonyms for courses. The most  
natural way to

achieve this would be having the field "course" to be multivalued.

Now, say I add documents  like:


 John Dane
 Algorithms
 Theory
 Computability, Complexity and Algorithms



 Mary Arriaga
 Algorithms for Pattern Matching


Now, if I query for "Algorithms", I get a higher score for document  
2 than

document 1.

1) I have noticed that this is because length norm factor of lucene  
scoring
considers all values of the mutivalued field, which is reducing the  
overall

score of document 1. How can I avoid this?

2) Is there a alternate way to achieve what I want here? I can think  
of

changing the schema of my index by making the field "course" as
single-valued and creating separate documents for each synonym for a  
course.

But wont that explode the index size.



One way to boost exact match of one occurrence of a multivalued field  
is to add some kind of special start-of-field token and end-of-field  
token in the data, eg:



 John Dane
 softok Algorithms eoftok
 softok Theory eoftok
 softok Computability, Complexity and Algorithms  
eoftok



Then, in your query you can boost hits with the complete phrase  
"softok queryword eoftok" by doing something like


queryword OR "softok queryword eoftok"^10

If you want to boost shorter fields in general and not only exat  
match, add some distance to the phrase part.


Of course, this will have a cost with regards to performance.

Could any of you Lucene experts out there explain to me why it isn't  
possible to do field boosting per occurrence. I know Solr doesn´t  
support it because Lucene doesn´t, but I can´t figure out the  
underlying reason. I think even a per-token kind of boosting (e.g.  
supporting someting like foobar^10 at indexing time) should be easy to  
implement in the Lucene relevance model and would have been very useful.


Svein



Re: Querying multivalued field - can scoring formula consider only matched values?

2008-10-14 Thread Svein Parnas


On Oct 13, 2008, at 9:34 PM, abhishek007 wrote:




Svein Parnas-2 wrote:



One way to boost exact match of one occurrence of a multivalued field
is to add some kind of special start-of-field token and end-of-field
token in the data, eg:


 John Dane
 softok Algorithms eoftok
 softok Theory eoftok
 softok Computability, Complexity and Algorithms
eoftok


Then, in your query you can boost hits with the complete phrase
"softok queryword eoftok" by doing something like

queryword OR "softok queryword eoftok"^10




I see what you are saying, but what if the query string itself  
contains
multiple synonyms, for example something like "Algorithms, Theory".  
With
this I would end up having "softok Algorithms, Theory eoftok" which  
would

not match the indexed data.


I was just trying to point you in a direction, not giving a complete  
solution. For multiword queries, the solution will depend on the query  
syntax you are going to support and how you want the ranking to be  
performed. For instance, if the interpretation of a simple two word  
query would be: "Both words required, boost short field occurrences  
before long but sort those hits where both words occure in the same  
field occurrence first", the query could be rewritten to


+"softok wordA eoftok"~ +"softok wordB eoftok"~ "wordA  
wordB"~^50


where  is about the number of tokens in the longest occurrence of  
the field in the index, but less than the field´s positionincrementgap.


The query parsing might get a bit messy if you are going to support  
advanced syntax. If the syntax you are going to support is about the  
same as DisMax, it could be an idea to modify DisMaxRequestHandler.  
Another way to go would be to use DisMax as is, find all query terms  
not prefixed with - in the query and add "softok word eoftok"~ to  
the bq parameter.


Svein



Re: exact field match

2009-01-26 Thread Svein Parnas
Another solution is to put a special token in front and end of every  
occurence of the field, eg aastartaa in front an zzendzz in the end (a  
solution looking like Fasts boundary match feature behind the hood),  
You could then search for exact match ("aastartaa your phrase  
zzendzz"), and you would also get support for 'begins  
with' ("aastartaa your phrase"), 'ends with' ("your phrase zzendzz")  
and, if you need it, boosting of short field values ("aastartaa your  
phrase zzendzz"~1) - a feature several others have been asking for  
earlier on on this list.


Svein

On 26. jan.. 2009, at 20.29, Erick Erickson wrote:


You need to index and search using something like
KeywordAnalyzer. That analyzer does no tokenizing/
data transformation or such. For instance, it
doesn't fold case.

You will be unable to search for "bond" and get a hit
in this case, so one solution is to use two fields, and
search one or the other depending upon your needs.
e.g.
myField
myFieldTokenized

Each field gets a complete copy of the data, and you search
"myField" in the case you're describing and
myFieldTokenized when you want to match on "bond".

Of course, if you never want a hit on "bond", you don't need the
Tokenized field.

Best
Erick

On Mon, Jan 26, 2009 at 2:15 PM, Antonio Zippo   
wrote:



Hi all,

i'm using a string field named "myField"
and 2 documents containing:
1. myField="my name is james bond"
2. myField="james bond"

if i use a query like this:
myField:"james bond" it returns 2 documents

how can i get only the second document using a string or text  
field? I need
to search the document with the exact valuenor documents  
containing the

exact phrase in value

thanks in advance







Re: Embedded Solr search query

2010-05-07 Thread Svein Parnas
Or send the queries in parallell from the PHP script (use CURL).

Svein


2010/5/7 caman :
>
> Why not write a custom request handler which can parse, split, execute and
> combine results to your queries?
>
>
>
>
>
>
>
> From: Eric Grobler [via Lucene]
> [mailto:ml-node+783150-1027691461-124...@n3.nabble.com]
> Sent: Friday, May 07, 2010 1:01 AM
> To: caman
> Subject: Embedded Solr search query
>
>
>
> Hello Solr community,
>
> When a user search on our web page, we need to run 3 related but different
> queries.
> For SEO reasons, we cannot use Ajax so at the moment we run 3 queries
> sequentially inside a PHP script.
> Allthough Solr is superfast,  the extra network overhead can make the 3
> queries 400ms slower than it needs to be.
>
> Thus my question is:
> Is there a way whereby you can send 1 query string to Solr with 2 or more
> embedded search queries, where Solr will split and execute the queries and
> return the results of the multiple searches in 1 go.
>
> In other words, instead of:
> -  send searchQuery1
>   get result1
> -  send searchQuery2
>   get result2
> ...
>
> you run:
> - send searchQuery1+searchQuery2
> - get result1+result2
>
> Thanks and Regards
> Eric
>
>
>
>  _
>
> View message @
> http://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp783150p78315
> 0.html
> To start a new topic under Solr - User, email
> ml-node+472068-464289649-124...@n3.nabble.com
> To unsubscribe from Solr - User, click
> < (link removed)
> GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx>  here.
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp783150p783156.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: SOLR X FAST

2007-12-12 Thread Svein Parnas


On Dec 12, 2007, at 2:50 AM, Nuno Leitao wrote:



FAST uses two pipelines - an ingestion pipeline (for document  
feeding) and a query pipeline which are fully programmable (i.e.,  
you can customize it fully). At ingestion time you typically prepare  
documents for indexing (tokenize, character normalize, lemmatize,  
clean up text, perform entity extraction for facets, perform static  
boosting for certain documents, etc.), while at query time you can  
expand synonyms, and do other general query side tasks (not unlike  
Solr).


Horizontal scalability means the ability to cluster your search  
engine across a large number of servers, so you can scale up on the  
number of documents, queries, crawls, etc.


There are FAST deployments out there which run on dozens, in some  
cases hundreds of nodes serving multiple terabyte size indexes and  
achieving hundreds of queries per seconds.


Yet again, if your requirements are relatively simple then Lucene  
might do the job just fine.


Hope this helps.


With Fast, you will also get things like:
- categorization
- clustering
- more flexible collapsing / grouping
- more scalable facets (navigators) - at least for multivalued fields
- gigabytes of poorly documented software
- operations from hell
- huge amount of bugs
- high bills, both for software and hardware.

As for linguistic features (named entity extraction, dictionary based  
lemmatization and so on) and things like categorization / clustering  
etc, things should not be expected to work to well unless you put a  
huge amount of work into it, and some of the features are really  
primitive.


To sum up, if Solr meets your needs I would highly recommend Solr. If  
you need some additional features and have the knowledge, integrate  
other products with Solr. If you really need the scalability, go for  
Fast or some other commercial software.


As for document preprocessing and connectors for Solr, if you need it,  
you could have a look at OpenPipe, http://openpipe.berlios.de/ (not  
yet announced).


Svein