Re: Ranking based on match position in field

2014-07-30 Thread Thomas Michael Engelke
Hi, thanks for the link. I've upgraded from the used 4.7 to the recent 4.9 version. I've tried to use the new feature with this query in the admin interface using edismax: description:Kühler^~1^5 However, the result seems to stay the same: description:Kühler~1^5 description:Kühler~1^5 (+de

Re: Query on Facet

2014-07-30 Thread Alexandre Rafalovitch
Now it sounds like maybe you have nested facets as opposed to just different ones. See if one of these fits your use case better: http://wiki.apache.org/solr/HierarchicalFaceting Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-

Querying from solr shards

2014-07-30 Thread Smitha Rajiv
Hi All, Currently i am using solr legacy distributed configuration (not solr cloud, single solr server with multiple shards). I need to write a query to get one particular document (id specific) from one shard and all documents from other shards. Can you please help me to get this query right.

Re: Query on Facet

2014-07-30 Thread Smitha Rajiv
Hi All, We have tried both exclude option as well as facet query. Both approach are not giving us the desired results. I will explain a little further. I have first level facets - Paperback and Ebook, and second level facets include a list of languages like English, French etc.. When user select

Re: Search result at next component

2014-07-30 Thread Lee Chunki
Hi Ahmet, it’s working :) Thank you Chunki. On Jul 31, 2014, at 7:48 AM, Ahmet Arslan wrote: > Hi Lee, > > You can use : > final DocList docList = rb.getResults().docList; > > And if you want to access individual field values, use solrpluginutils' > static method to obtain SolrDocumentList

Re: Index a time/date range

2014-07-30 Thread david.w.smi...@gmail.com
The wiki page on the technique cleans up some small errors from Hoss’s presentation: http://wiki.apache.org/solr/SpatialForTimeDurations But please try Solr trunk which has first-class support for date durations: https://issues.apache.org/jira/browse/SOLR-6103 Soonish I’ll back-port to 4x. ~ Davi

Re: Character encoding problems

2014-07-30 Thread Gulliver Smith
Thanks for all the replies - I should have made clear that the first thing I did was confirm that everything on the PHP side is UTF-8. The web pages, the input text, the input files etc. The browser confirms that the encoding is UTF-8 for all of the web pages, the response headers as inspected by t

Re: Index a time/date range

2014-07-30 Thread Jost Baron
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Ryan, On 07/31/2014 01:26 AM, Ryan Cutter wrote: > Is there a way to index time or date ranges? That is, assume 2 > docs: > > #1: date = 2014-01-01 #2: date = 2014-02-01 through 2014-05-01 > > Would there be a way to index #2's date as a single

Re: Index a time/date range

2014-07-30 Thread Alexandre Rafalovitch
For fancier versions, some people used geo coordinates to represent start on X axis and stop on Y. Then use perimeter bounds to do overlaps. There was a discussion on the list about that a while ago. Regards, Alex On 31/07/2014 6:26 am, "Ryan Cutter" wrote: > Is there a way to index time or

Index a time/date range

2014-07-30 Thread Ryan Cutter
Is there a way to index time or date ranges? That is, assume 2 docs: #1: date = 2014-01-01 #2: date = 2014-02-01 through 2014-05-01 Would there be a way to index #2's date as a single field and have all the search options you usually get with time/date? One strategy could be to index the start

Setting a Key/Tag/Label for each group.query Result Set

2014-07-30 Thread Carlos Maroto
Hi, I'm trying to get results in a single Solr call through multiple group.query definitions. I'm getting the results I want but, each group is presented under a "name" consisting of the query used for that group. I'd like to change the "name" of each group to some meaningful name instead. I'm

Re: Search result at next component

2014-07-30 Thread Ahmet Arslan
Hi Lee, You can use : final DocList docList = rb.getResults().docList; And if you want to access individual field values, use solrpluginutils' static method to obtain SolrDocumentList SolrDocumentList solrDocs = docListToSolrDocumentList(rb.getResults().docList, req.getSearcher(), fields); Ah

Re: Searching and highlighting ten's of fields

2014-07-30 Thread Erick Erickson
bq: Is there a way to search the global copyField but highlight the original stored fields? That's what I was suggesting. Specify the global field for your search, but use hl.fl for fields you want to copy. And yes, storing the fields is required for highlighting. Consider stemming (or worse, ph

Re: Searching and highlighting ten's of fields

2014-07-30 Thread Manuel Le Normand
The slowdown occurs during search, not highlighting. Having a disjunctive query with 50 terms running 20 different posting lists is a hard task. Harder than searching these 50 terms on a single (larger) posting list as in the copyField case. With the edismax qf param, sure, hl.fl=* works as it sho

Re: Avoiding indexing of hidden folders and files

2014-07-30 Thread Ahmet Arslan
Hi Ameya, You meant to post manifoldcf user mailing list? Or are you referring "java -jar post.jar" utility? Ahmet On Wednesday, July 30, 2014 11:15 PM, Ameya Aware wrote: Hi, I noticed a fact that Solr indexes all the folders and files including hidden files. Can anyone help me with avoidin

Avoiding indexing of hidden folders and files

2014-07-30 Thread Ameya Aware
Hi, I noticed a fact that Solr indexes all the folders and files including hidden files. Can anyone help me with avoiding indexing of hidden files? Thanks, Ameya

Re: Implementing custom analyzer for multi-language stemming

2014-07-30 Thread Sujit Pal
Hi Eugene, In a system we built couple of years ago, we had a corpus of English and French mixed (and Spanish on the way but that was implemented by client after we handed off). We had different fields for each language. So (title, body) for English docs was (title_en, body_en), for French (title_

Re: Copy existing index from standalone Solr to Solr cloud

2014-07-30 Thread avgxm
Used the admin/collections?action=SPLITSHARD, to create shard1_0, shard1_1, and then followed this thread http://lucene.472066.n3.nabble.com/How-can-you-move-a-shard-from-one-SolrCloud-node-to-another-td4106815.html to move the shards to the right nodes. Problem solved. -- View this message in

Exception : Processing of multipart/form-data request failed.

2014-07-30 Thread Ameya Aware
Hi I am getting exception for Processing of multipart/form-data request failed. My solrconfig.xml contains: Please find below the stack trace. ERROR - 2014-07-30 13:52:05.013; org.apache.solr.common.SolrException; null:org.apache.commons.fileupload.FileUploadBase$IOFileUploadException:

re: Implementing custom analyzer for multi-language stemming

2014-07-30 Thread Chris Morley
I know BasisTech.com has a plugin for elasticsearch that extends stemming/lemmatization to work across 40 natural languages. I'm not sure what they have for Solr, but I think something like that may exist as well. Cheers, -Chris. From: "Eugene" Sent: W

Implementing custom analyzer for multi-language stemming

2014-07-30 Thread Eugene
Hello, fellow Solr and Lucene users and developers! In our project we receive text from users in different languages. We detect language automatically and use Google Translate APIs a lot (so having arbitrary number of languages in our system doesn't concern us). However we need to be able

Re: Bloom filter

2014-07-30 Thread Shalin Shekhar Mangar
You're right. I misunderstood. I thought that you wanted to optimize the "finding by id" path which is typically done for comparing versions during inserts in Solr. Yes, it won't help with the case where the ID does not exist. On Wed, Jul 30, 2014 at 6:14 PM, Per Steffensen wrote: > Hi > > I a

Re: Index size increase after upgrade to 4.9?

2014-07-30 Thread Shawn Heisey
On 7/30/2014 10:00 AM, Shawn Heisey wrote: > It may turn out that this is actually a bug in merging, where old > segments are not getting deleted. I noticed in the optimized index that > there is a single large segment of about 20GB and a bunch of other > segments that are all older than the singl

Re: Index size increase after upgrade to 4.9?

2014-07-30 Thread Shawn Heisey
On 7/30/2014 9:16 AM, Shawn Heisey wrote: > On 7/30/2014 9:10 AM, Erick Erickson wrote: >> I assume you've optimized? Or otherwise insured that there aren't >> any deleted docs > It's all straight indexing with DIH from MySQL, so there really are no > deleted docs, but about an hour after the r

RE: Searching words with spaces for word without spaces in solr

2014-07-30 Thread Dyer, James
In addition to the analyzer configuration you're using, you might want to also use WordBreakSolrSpellChecker to catch possible matches that can't easily be solved through analysis. For more information, see the section for it at https://cwiki.apache.org/confluence/display/solr/Spell+Checking

Re: Tika analyzers

2014-07-30 Thread Alexandre Rafalovitch
Solr effectively supports only one binary document that gets indexed. This is because you are not actually indexing the document. You are extracting metadata (e.g. Author) and content fields out of it and map it to the "Solr document". So, it makes no sense to have two fields that are binary becaus

Re: Index size increase after upgrade to 4.9?

2014-07-30 Thread Shawn Heisey
On 7/30/2014 9:10 AM, Erick Erickson wrote: > I assume you've optimized? Or otherwise insured that there aren't > any deleted docs It's all straight indexing with DIH from MySQL, so there really are no deleted docs, but about an hour after the rebuild finished, one of the shards did get optimi

Re: Index size increase after upgrade to 4.9?

2014-07-30 Thread Erick Erickson
I assume you've optimized? Or otherwise insured that there aren't any deleted docs Best, Erick On Wed, Jul 30, 2014 at 6:27 AM, Shawn Heisey wrote: > Yesterday I upgraded my dev server to Solr 4.9, and also upgraded a > third-party plugin to a new version that's compatible with Solr 4.9. >

Re: Tika analyzers

2014-07-30 Thread Erick Erickson
Hmmm, might a custom update processor do that? In an update processor, you'd get the binary and be able to do anything at all you wanted to with that. I'm not quite clear on how the binary gets through the Tika bits and gets passed in in the first place, but Best, Erick On Wed, Jul 30, 2014

Re: SolrCloud without NRT and indexing only on the master

2014-07-30 Thread Erick Erickson
Sorry for the confusion between "legacy" and "traditional", it's just sloppy terminology. There's no sense of "don't use this" with traditional M/R replication. In fact, when SolrCloud nodes need to catch up with their indexes if they're very out of sync, this is still used. So it's definitely supp

Re: Searching words with spaces for word without spaces in solr

2014-07-30 Thread sunshine glass
This is the analysis page: ​​ ​ Please help me now. On Wed, Jul 30, 2014 at 8:08 PM, sunshine glass < sunshineglassof2...@gmail.com> wrote: > This is the new configuration: > > > positionIncrementGap="100"> >> >> >> >> >> > outputUnigrams="true" tokenSeparat

Re: Searching and highlighting ten's of fields

2014-07-30 Thread Erick Erickson
Doesn't hl.fl work in this case? Or is highlighting the 10 fields the slowdown? Best, Erick On Wed, Jul 30, 2014 at 2:55 AM, Manuel Le Normand < manuel.lenorm...@gmail.com> wrote: > Current I use the classic but I can change my posting format in order to > work with another highlighting compone

Re: Identify specific document insert error inside a solrj batch request

2014-07-30 Thread Jack Krupansky
Agreed that this is a problem with Solr. If it was merely "bad input", Solr should be returning a 4xx error. I don't know if we already have a Jira for this. If not, one should be filed. There are two issues: 1. The status code should be 4xx with an appropriate message about "bad input".

Re: Query on Facet

2014-07-30 Thread Sujit Pal
Hi Smitha, Have you looked at Facet queries? It allows you to attach Solr queries to facets. The problem with this is that you will need to know all possible combinations of language and binding (or make an initial query to find this information). https://wiki.apache.org/solr/SimpleFacetParameter

Re: Searching words with spaces for word without spaces in solr

2014-07-30 Thread sunshine glass
This is the new configuration: positionIncrementGap="100"> > > > > outputUnigrams="true" tokenSeparator=""/> > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > >

Re: Query on Facet

2014-07-30 Thread vamshi kiran
Hi Alex, As you said If we exclude language facet field ,it will get all the language facets with count right ? It Will not filter by binding facet field of type 'paperback' , how can we do this ? Thanks & Regards, Vamshi. On Jul 30, 2014 4:11 PM, "Alexandre Rafalovitch" wrote: > I am not sure

Index size increase after upgrade to 4.9?

2014-07-30 Thread Shawn Heisey
Yesterday I upgraded my dev server to Solr 4.9, and also upgraded a third-party plugin to a new version that's compatible with Solr 4.9. After the index was rebuilt, each shard was 28GB ... but before the upgrade, each shard was only 20GB. The number of documents per shard (16.4 million) actually

Identify specific document insert error inside a solrj batch request

2014-07-30 Thread Liram Vardi
Hi All, I have a question regarding the use of HttpSolrServer (SolrJ). I have a collection of SolrInputDocuments I want to send to Solr as a batch. Now, let's assume that one of the docs inside this collection is corrupted (missing some "required" field). When I send the batch of docs to solr usi

Search on Date Field

2014-07-30 Thread Pbbhoge
In my SOLR there is date field(published_date) and values are in this format "2012-09-26T10:08:09.123Z" How I can search by simple input like "2012-09-10" instead of full ISO date format. Is it possible in SOLR? -- View this message in context: http://lucene.472066.n3.nabble.com/Search-on-Da

Tika analyzers

2014-07-30 Thread Tommaso Teofili
Hi all, while SolrCell works nicely when in need of indexing binary documents, I am wondering about the possibility of having Lucene / Solr documents that have binaries in specific Lucene fields, e.g. title="a nice doc", name"blabla.doc", binary="0x1234...". In that case the "binary" field should

Re: Ranking based on match position in field

2014-07-30 Thread Ahmet Arslan
Hi, Please see : https://issues.apache.org/jira/browse/SOLR-3925 Ahmet On Wednesday, July 30, 2014 2:39 PM, Thomas Michael Engelke wrote: Hi, an example. We have 2 records with this data in the same field (description): 1: Lufthutze vor Kühler Bj 62-65, DS 2: Kühler HY im Austausch, Alttei

Re: Bloom filter

2014-07-30 Thread Per Steffensen
Hi I am not sure exactly what LUCENE-5675 does, but reading the description it seems to me that it would help finding out that there is no document (having an id-field) where version-field is less than . As far as I can see this will not help finding out if a document with id= exists. We want

Ranking based on match position in field

2014-07-30 Thread Thomas Michael Engelke
Hi, an example. We have 2 records with this data in the same field (description): 1: Lufthutze vor Kühler Bj 62-65, DS 2: Kühler HY im Austausch, Altteilpfand 250 Euro A search with the parameters 'description:Kühler' does provide this debug: 2.3234584 = (MATCH) weight(description:kühler in 40

Re: SolrCloud without NRT and indexing only on the master

2014-07-30 Thread Harald Kirsch
Hi Daniel, well, I assume there is a performance difference on host B between a) getting some ready-made segments from host A (master, taking care of indexing) to host B (slave, taking care of answering queries) and b) host B (along with host A) doing all the work necessary to prepare incom

Re: Query on Facet

2014-07-30 Thread Alexandre Rafalovitch
I am not sure I fully understood your question, but I would start by looking at Tagging and Excluding first: https://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http:

Query on Facet

2014-07-30 Thread Smitha Rajiv
Hi, I need some help on Solr Faceting. How do I facet on two fields at the same time to get combination facets and its count? I'm using below query to get facets with combination of language and its binding. But now I'm getting only selected facet in facetList of each field and its count. Fo

Re: SolrCloud without NRT and indexing only on the master

2014-07-30 Thread Daniel Collins
Working backwards slightly, what do you think SolrCloud is going to give you, apart from the consistency of the index (which you want to turn off)? What are "all the other benefits of SolrCloud", if you are querying separate instances that aren't guaranteed to be in sync (since you want to use the

Re: Searching and highlighting ten's of fields

2014-07-30 Thread Manuel Le Normand
Current I use the classic but I can change my posting format in order to work with another highlighting component if that leads to any solution

PeerSync: too many updates received since start - startingUpdates no longer overlaps with our currentUpdates

2014-07-30 Thread 汤林
Hi, All. I met one issue when sending lots of docs to a 2-nodes SolrCloud. My env has one collection with 2 nodes. The only collection has 2 shards with 2 replica of each shard. We are using Solr 4.7. We found this warning when we are sending docs to the SolrCloud. And we noticed one reques

Search result at next component

2014-07-30 Thread Lee Chunki
Hi, I am building a new component and it run a new query depend on previous query result. solrconfig.xml setting is like : query newComponent facet mlt highlight stats debug Do you know how can I get “query component” result

Re: SOLR Schema add constant prefix to field value

2014-07-30 Thread Alexandre Rafalovitch
On Wed, Jul 30, 2014 at 3:21 PM, Eichstädt, Konrad wrote: > Now I would have the same field value with a constant prefix like: > > Your source value in the Clone URP is mis-spelt. So that might be part of the failure. But I would look at RegexReplace URP instead: http://www.solr-start.com/javado

Re: Bloom filter

2014-07-30 Thread Shalin Shekhar Mangar
I opened https://issues.apache.org/jira/browse/SOLR-6301 On Wed, Jul 30, 2014 at 1:35 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > Hi Per, > > There's LUCENE-5675 which has added a new postings format for IDs. Trying > it out in Solr is in my todo list but maybe you can get to it

Search result at next component

2014-07-30 Thread Lee Chunki
Hi, I am building a new component and it run a new query depend on previous query result. solrconfig.xml setting is like : query newComponent facet mlt highlight stats debug Do you know how can I get “query component” result

SOLR Schema add constant prefix to field value

2014-07-30 Thread Eichstädt , Konrad
Dear Solr User Group, I need your help for configuration the solr schema properly. What I would do is: I have the following field within the schema: Now I would have the same field value with a constant prefix like: And always must be a prefix before the URL value like this: http://b-vm-os

Re: Bloom filter

2014-07-30 Thread Shalin Shekhar Mangar
Hi Per, There's LUCENE-5675 which has added a new postings format for IDs. Trying it out in Solr is in my todo list but maybe you can get to it before me. https://issues.apache.org/jira/browse/LUCENE-5675 On Wed, Jul 30, 2014 at 12:57 PM, Per Steffensen wrote: > On 30/07/14 08:55, jim ferencz

Re: Searching and highlighting ten's of fields

2014-07-30 Thread aurelien . mazoyer
Hello, Do you use classic highlighter or fast vector highlighter? Aurélien On 30.07.2014 09:36, Manuel Le Normand wrote: Hello, I need to expose the search and highlighting capabilities over few tens of fields. The edismax's qf param makes it possible but the time performances for searching

Searching and highlighting ten's of fields

2014-07-30 Thread Manuel Le Normand
Hello, I need to expose the search and highlighting capabilities over few tens of fields. The edismax's qf param makes it possible but the time performances for searching tens of words over tens of fields is problematic. I made a copyField (indexed, not stored) for these fields, which gives way be

Re: Bloom filter

2014-07-30 Thread Per Steffensen
On 30/07/14 08:55, jim ferenczi wrote: Hi Per, First of all the BloomFilter implementation in Lucene is not exactly a bloom filter. It uses only one hash function and you cannot set the false positive ratio beforehand. ElasticSearch has its own bloom filter implementation (using "guava like" Bloo

Re: SolrCloud without NRT and indexing only on the master

2014-07-30 Thread Harald Kirsch
Thanks Erick, for the confirmation. You say "traditional" but the docs call it "legacy". Not a native speaker I might misinterpret the meaning slightly but to me it conveys the notion of "don't use this stuff if you don't have to". "SolrCloud indexes to all nodes all the time, there's no rea