Re: Sum of one field
The easiest solution would probably be to have a facet on the quantity field and calculate the total quantity on the client side. Svein On 4. aug.. 2008, at 21.47, Otis Gospodnetic wrote: Leonardo, You'd have to read that "quantity" fields for all matching documents one way or the other. One way is by getting all results and pulling that field out, so you can get the sum.. Another way is to hack the SolrIndexSearcher and get this value in one of the HitCollector collect method calls. Another possibility, if your index is fairly static, might be to read it all documents' (not just matches') quantity field and store that in a docID->quantity map structure that lets you look up quantity for any docID you want. There may be other/better ways of doing this, but this is what comes to (my) mind first. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Leonardo Dias <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Monday, August 4, 2008 1:19:45 PM Subject: Sum of one field Everyone exhibits "your search for x has returned y results" on the top of the results page, but we need something else, which would be something like "your search for x returned y results in z records", being z the numdocs of the SOLR response and y a SUM(quantity) of all returned records. In SQL you can do something like: SELECT count(1), sum(quantity) FROM table But with SOLR we don't know how can we do the same without having to return all the XML result for the field "quantity" and then sum it to show the total. Any hints on how to do it in a better way? cheers, Leonardo
Re: Replacing FAST functionality at sesam.no
On 27. aug.. 2008, at 19.44, Glenn-Erik Sandbakken wrote: At sesam.no we want to replace a FAST (fast.no) Query Matching Server with a Solr index. The index we are trying to replace is not a regular index, but specially configured to perform phrases (and sub-phrases) matches against several large lists (like an index with only a 'title' field). I'm not sure of a correct, or logical, name for the behavior we are after, but it is like a combination between Shingles and exact matching. Some examples should explain it well. In order to do this, you can´t use the ShingleFilter during indexing since a document like "one two three" and a query like "one two four" will match since they have the shingle "one two" in common. You will get what you want, I think, if you don´t tokenize during indexing (some normalization will be required if your lists aren't normalized to begin with) and apply the ShingleFilter only to the queries. Svein
Re: Querying multivalued field - can scoring formula consider only matched values?
On 7. okt.. 2008, at 21.49, abhishek007 wrote: Hi, My application needs to handle synonyms for courses. The most natural way to achieve this would be having the field "course" to be multivalued. Now, say I add documents like: John Dane Algorithms Theory Computability, Complexity and Algorithms Mary Arriaga Algorithms for Pattern Matching Now, if I query for "Algorithms", I get a higher score for document 2 than document 1. 1) I have noticed that this is because length norm factor of lucene scoring considers all values of the mutivalued field, which is reducing the overall score of document 1. How can I avoid this? 2) Is there a alternate way to achieve what I want here? I can think of changing the schema of my index by making the field "course" as single-valued and creating separate documents for each synonym for a course. But wont that explode the index size. One way to boost exact match of one occurrence of a multivalued field is to add some kind of special start-of-field token and end-of-field token in the data, eg: John Dane softok Algorithms eoftok softok Theory eoftok softok Computability, Complexity and Algorithms eoftok Then, in your query you can boost hits with the complete phrase "softok queryword eoftok" by doing something like queryword OR "softok queryword eoftok"^10 If you want to boost shorter fields in general and not only exat match, add some distance to the phrase part. Of course, this will have a cost with regards to performance. Could any of you Lucene experts out there explain to me why it isn't possible to do field boosting per occurrence. I know Solr doesn´t support it because Lucene doesn´t, but I can´t figure out the underlying reason. I think even a per-token kind of boosting (e.g. supporting someting like foobar^10 at indexing time) should be easy to implement in the Lucene relevance model and would have been very useful. Svein
Re: Querying multivalued field - can scoring formula consider only matched values?
On Oct 13, 2008, at 9:34 PM, abhishek007 wrote: Svein Parnas-2 wrote: One way to boost exact match of one occurrence of a multivalued field is to add some kind of special start-of-field token and end-of-field token in the data, eg: John Dane softok Algorithms eoftok softok Theory eoftok softok Computability, Complexity and Algorithms eoftok Then, in your query you can boost hits with the complete phrase "softok queryword eoftok" by doing something like queryword OR "softok queryword eoftok"^10 I see what you are saying, but what if the query string itself contains multiple synonyms, for example something like "Algorithms, Theory". With this I would end up having "softok Algorithms, Theory eoftok" which would not match the indexed data. I was just trying to point you in a direction, not giving a complete solution. For multiword queries, the solution will depend on the query syntax you are going to support and how you want the ranking to be performed. For instance, if the interpretation of a simple two word query would be: "Both words required, boost short field occurrences before long but sort those hits where both words occure in the same field occurrence first", the query could be rewritten to +"softok wordA eoftok"~ +"softok wordB eoftok"~ "wordA wordB"~^50 where is about the number of tokens in the longest occurrence of the field in the index, but less than the field´s positionincrementgap. The query parsing might get a bit messy if you are going to support advanced syntax. If the syntax you are going to support is about the same as DisMax, it could be an idea to modify DisMaxRequestHandler. Another way to go would be to use DisMax as is, find all query terms not prefixed with - in the query and add "softok word eoftok"~ to the bq parameter. Svein
Re: exact field match
Another solution is to put a special token in front and end of every occurence of the field, eg aastartaa in front an zzendzz in the end (a solution looking like Fasts boundary match feature behind the hood), You could then search for exact match ("aastartaa your phrase zzendzz"), and you would also get support for 'begins with' ("aastartaa your phrase"), 'ends with' ("your phrase zzendzz") and, if you need it, boosting of short field values ("aastartaa your phrase zzendzz"~1) - a feature several others have been asking for earlier on on this list. Svein On 26. jan.. 2009, at 20.29, Erick Erickson wrote: You need to index and search using something like KeywordAnalyzer. That analyzer does no tokenizing/ data transformation or such. For instance, it doesn't fold case. You will be unable to search for "bond" and get a hit in this case, so one solution is to use two fields, and search one or the other depending upon your needs. e.g. myField myFieldTokenized Each field gets a complete copy of the data, and you search "myField" in the case you're describing and myFieldTokenized when you want to match on "bond". Of course, if you never want a hit on "bond", you don't need the Tokenized field. Best Erick On Mon, Jan 26, 2009 at 2:15 PM, Antonio Zippo wrote: Hi all, i'm using a string field named "myField" and 2 documents containing: 1. myField="my name is james bond" 2. myField="james bond" if i use a query like this: myField:"james bond" it returns 2 documents how can i get only the second document using a string or text field? I need to search the document with the exact valuenor documents containing the exact phrase in value thanks in advance
Re: Embedded Solr search query
Or send the queries in parallell from the PHP script (use CURL). Svein 2010/5/7 caman : > > Why not write a custom request handler which can parse, split, execute and > combine results to your queries? > > > > > > > > From: Eric Grobler [via Lucene] > [mailto:ml-node+783150-1027691461-124...@n3.nabble.com] > Sent: Friday, May 07, 2010 1:01 AM > To: caman > Subject: Embedded Solr search query > > > > Hello Solr community, > > When a user search on our web page, we need to run 3 related but different > queries. > For SEO reasons, we cannot use Ajax so at the moment we run 3 queries > sequentially inside a PHP script. > Allthough Solr is superfast, the extra network overhead can make the 3 > queries 400ms slower than it needs to be. > > Thus my question is: > Is there a way whereby you can send 1 query string to Solr with 2 or more > embedded search queries, where Solr will split and execute the queries and > return the results of the multiple searches in 1 go. > > In other words, instead of: > - send searchQuery1 > get result1 > - send searchQuery2 > get result2 > ... > > you run: > - send searchQuery1+searchQuery2 > - get result1+result2 > > Thanks and Regards > Eric > > > > _ > > View message @ > http://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp783150p78315 > 0.html > To start a new topic under Solr - User, email > ml-node+472068-464289649-124...@n3.nabble.com > To unsubscribe from Solr - User, click > < (link removed) > GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx> here. > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp783150p783156.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: SOLR X FAST
On Dec 12, 2007, at 2:50 AM, Nuno Leitao wrote: FAST uses two pipelines - an ingestion pipeline (for document feeding) and a query pipeline which are fully programmable (i.e., you can customize it fully). At ingestion time you typically prepare documents for indexing (tokenize, character normalize, lemmatize, clean up text, perform entity extraction for facets, perform static boosting for certain documents, etc.), while at query time you can expand synonyms, and do other general query side tasks (not unlike Solr). Horizontal scalability means the ability to cluster your search engine across a large number of servers, so you can scale up on the number of documents, queries, crawls, etc. There are FAST deployments out there which run on dozens, in some cases hundreds of nodes serving multiple terabyte size indexes and achieving hundreds of queries per seconds. Yet again, if your requirements are relatively simple then Lucene might do the job just fine. Hope this helps. With Fast, you will also get things like: - categorization - clustering - more flexible collapsing / grouping - more scalable facets (navigators) - at least for multivalued fields - gigabytes of poorly documented software - operations from hell - huge amount of bugs - high bills, both for software and hardware. As for linguistic features (named entity extraction, dictionary based lemmatization and so on) and things like categorization / clustering etc, things should not be expected to work to well unless you put a huge amount of work into it, and some of the features are really primitive. To sum up, if Solr meets your needs I would highly recommend Solr. If you need some additional features and have the knowledge, integrate other products with Solr. If you really need the scalability, go for Fast or some other commercial software. As for document preprocessing and connectors for Solr, if you need it, you could have a look at OpenPipe, http://openpipe.berlios.de/ (not yet announced). Svein