multivalued field using DIH

2014-03-26 Thread scallawa
I am using solr 4.7 and am importing data directly from a mysql database table using the DIH. I have a column that looks like similar to this below in that it has multiple values in the database. material cotton "polyester blend" rayon I would like the data to look like the following wh

RE: Solr 4.3.1 memory swapping

2014-03-26 Thread Darrell Burgan
Okay well it didn't take long for the swapping to start happening on one of our nodes. Here is a screen shot of the Solr console: https://s3-us-west-2.amazonaws.com/panswers-darrell/solr.png And here is a shot of top, with processes sorted by VIRT: https://s3-us-west-2.amazonaws.com/panswers-d

RE: Solr 4.3.1 memory swapping

2014-03-26 Thread Darrell Burgan
Okay I'll post some shots somewhere people can get to them to demonstrate what I'm seeing. Unfortunately I just deployed some unrelated stuff to Solr that caused me to restart each node in the SolrCloud cluster. So right now the swap usage is minimal. I'll let it grow for a few days then send so

Re: charset encoding

2014-03-26 Thread Alexandre Rafalovitch
Can you do a ServletFilter and modify things before they hit Solr? Haven't tried this particular scenario myself, but it's something to look at. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of natur

RE: Solr 4.3.1 memory swapping

2014-03-26 Thread Shawn Heisey
> Thanks - we're currently running Solr inside of RHEL virtual machines > inside of VMware. Running "numactl --hardware" inside the VM shows the > following: > > available: 1 nodes (0) > node 0 size: 16139 MB > node 0 free: 364 MB > node distances: > node 0 > 0: 10 > > So there is only one nod

Facetting by field then query

2014-03-26 Thread David Larochelle
I have the following schema I'd like to be able to facet by a field and then by queries. i.e. facet_fields": {"media_id": ["1":{ "sentence:foo": 102410, "sentence:bar": 29710}"2": { "sentence:foo": 600, "sentence:bar": 220} "3": { "sentence:foo": 80, "sentence:bar": 2330}]} However, when

RE: Solr 4.3.1 memory swapping

2014-03-26 Thread Darrell Burgan
Thanks - we're currently running Solr inside of RHEL virtual machines inside of VMware. Running "numactl --hardware" inside the VM shows the following: available: 1 nodes (0) node 0 size: 16139 MB node 0 free: 364 MB node distances: node 0 0: 10 So there is only one node being shown. So t

Re: how do I get search for "fort st john" to match "ft saint john"

2014-03-26 Thread Walter Underwood
Step 1 is to use the Analysis tool in the admin UI. That will show what each step in your pipeline is doing. wunder On Mar 26, 2014, at 2:10 PM, solr-user wrote: > I have been using solr for a while but started running across situations > where synonyms are required. > > the example I have is

how do I get search for "fort st john" to match "ft saint john"

2014-03-26 Thread solr-user
I have been using solr for a while but started running across situations where synonyms are required. the example I have is group of city names that look like "Fort Saint John" (a city), in a text field. Users may want to search for "Ft St John" or "Fort St John" or "Ft Saint John" however My at

Re: Searching multivalue fields.

2014-03-26 Thread Ahmet Arslan
Hi Vijay, After reading the documentation it seems that following query is what you are after. It will return OrderId:345 without matching OrderId:123 SpanQuery q1  = new SpanTermQuery(new Term("BookingRecordId", "234")); SpanQuery q2  = new SpanTermQuery(new Term("OrderLineType", "11")); SpanQu

Re: Max value for a TrieDateField is not constant

2014-03-26 Thread Chris Hostetter
: I have a date field in my Solr schema defined as described below : When I'm trying to query fields stats, the max value for that date field is : not constant, it changes between two distinct date values as I retry query. : Any ideas as to why this is happening? smells like ou might have replic

Re: Searching multivalue fields.

2014-03-26 Thread Ahmet Arslan
Hi Vijay, I personally don't understand joins very well. Just a guess may be  FieldMaskingSpanQuery could be used? http://blog.griddynamics.com/2011/07/solr-experience-search-parent-child.html Ahmet On Wednesday, March 26, 2014 9:46 PM, Vijay Kokatnur wrote: Hi, I am bumping this thread a

Re: MergingSolrIndexes not supported by SolrCloud?why?

2014-03-26 Thread Mark Miller
FWIW, you can use merge like this if you run on HDFS rather than local filesystem. --  Mark  On March 26, 2014 at 12:34:39 PM, Shawn Heisey (s...@elyograg.org) wrote: On 3/26/2014 3:14 AM, rulinma wrote: > MergingSolrIndexes: > > http://192.168.22.32:8080/solr/admin/cores?action=mergeinde

Re: Question on highlighting edgegrams

2014-03-26 Thread Software Dev
Is this a known bug? On Tue, Mar 25, 2014 at 1:12 PM, Software Dev wrote: > Same problem here: > http://lucene.472066.n3.nabble.com/Solr-4-x-EdgeNGramFilterFactory-and-highlighting-td4114748.html > > On Tue, Mar 25, 2014 at 9:39 AM, Software Dev > wrote: >> Bump >> >> On Mon, Mar 24, 2014 at 3:

RE: String Cast Error

2014-03-26 Thread AJ Lemke
An update to this. If I change my search and add a parameter the error seems to go away. Error: /solr/collection1/select?q=*:* &wt=json&sort=inStock desc No Error: /solr/collection1/select?q=Samsung&wt=json&sort=inStock desc AJ -Original Message- From: AJ Lemke [mailto:aj.le...@securit

Searching multivalue fields.

2014-03-26 Thread Vijay Kokatnur
Hi, I am bumping this thread again one last time to see if anyone has a solution. In it's current state, our application is storing child items as multivalue fields. Consider some orders, for example - { OrderId:123 BookingRecordId : ["145", "987", "*234*"] OrderLineType : ["11", "12", "*13*"

Max value for a TrieDateField is not constant

2014-03-26 Thread Dmitriy Morozov
Hi everyone. I'm using Solr Cloud, installed from CDH: 4.4.0-cdh5.0.0-beta-1 I have a date field in my Solr schema defined as described below When I'm trying to query fields stats, the max value for that date field is not constant, it changes between two distinct date values as I retry query.

External fields - How to configure custom data directory?

2014-03-26 Thread bbi123
Hi, I am pretty new to external fields in SOLR. Is it possible to configure the path of the external field files? By default SOLR searches data/ directory for the external field files. I am trying to figure out if there is a way to modify the data directory path. I tried using the /reloadCache

Re: Fwd: Solr | Merge Factor | Implementation

2014-03-26 Thread Shawn Heisey
On 3/26/2014 12:50 PM, Usman Parvez wrote: Thanks Shawn for your reply. Actually, initially I only used 3 in config without defining any policy. But was not able to find any positive results of merging as Number of FDT and FDX was still the same as before. Yesterday, tried these default setting f

RE: index merge

2014-03-26 Thread Susheel Kumar
It is mentioned in cwiki that indexes must not include duplicate data https://cwiki.apache.org/confluence/display/solr/Merging+Indexes HTH. Thanks, Susheel -Original Message- From: Cihad Guzel [mailto:c.guzel@gmail.com] Sent: Wednesday, March 26, 2014 2:21 PM To: solr-user@lucene.ap

Re: Fwd: Solr | Merge Factor | Implementation

2014-03-26 Thread Usman Parvez
Thanks Shawn for your reply. Actually, initially I only used 3 in config without defining any policy. But was not able to find any positive results of merging as Number of FDT and FDX was still the same as before. Yesterday, tried these default setting from Solr documentation mentioned earlier but

Re: Wildcard use for partial matches

2014-03-26 Thread pratpor
No. Field is of type "text_en". I can use "text_en_splitting_edge_ngram" for now. Its working for me. But it puts so much load on solr i think as all ngrams will have to be indexed. Thats why I was looking for a solution with wildcard. On Wed, Mar 26, 2014 at 11:20 PM, Erick Erickson [via Lucene]

index merge

2014-03-26 Thread Cihad Guzel
hi all. I am trying solr merge index on two collection (solr 4.6.1). There are same index on each collection. I tried as follow: http://localhost:8983/solr/admin/cores?action=mergeindexes&core=collection1&srcCore=collection1&srcCore=collection2 It is successfully. But, Although id is unique key

Question About Solr Grouping

2014-03-26 Thread Furkan KAMACI
Hi; When I try this: ... Set urlSet = new HashSet<>(); Map urlMap = new HashMap<>(); ... solrParams = new ModifiableSolrParams(); solrParams.add("q", "*"); solrParams.add("fq", "-title:[* TO *]"); solrParams.add("start", start.toString()); solrParams.add("rows", step.toString()); solrParams.add(

solrcloud distributed search questions

2014-03-26 Thread Joshi, Shital
Hi, We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes. When GUI fires query to solr url: 1. The node which receives query, Does it send query to each shard in parallel or in sequence? 2. From log file, how do we find total time taken to integrate results from different

Re: Solr 4.3.1 memory swapping

2014-03-26 Thread Lan
It could be related to NUMA. Check out this article about it which has some fixes that worked for me. http://blog.jcole.us/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/ -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-3-1-memory-swapping-tp4126641p41271

Re: Wildcard use for partial matches

2014-03-26 Thread Erick Erickson
Looks like maybe you're defining the field is a string type? On Mar 26, 2014 9:07 AM, "pratpor" wrote: > I am having some strange issue with solr that I found nowhere on the web. I > was looking for partial matches for some fields in solr. I know of the "*" > wildcard which can be used for the sa

Re: stored=true vs stored=false, in terms of storage

2014-03-26 Thread Rafał Kuć
Hello! Stored fields, the one with stored=true, mean that the original value of the field is stored and can be retrieved with the document in search results. Fields marked as stored=false can't be retrieved during searching. That is the difference. -- Regards, Rafał Kuć Performance Monitoring *

stored=true vs stored=false, in terms of storage

2014-03-26 Thread Pramod Negi
Hi, I am using Solr and I have one doubt. If any field has stored=false, does it mean that this fields is stored in disk and not in main memory. and this will be loaded whenever asked. The scenario I would like to handle this, In my case there are lots of information which I need to show when d

New to Solr can someone help me to know if Solr fits my use case

2014-03-26 Thread Saurabh Agarwal
Hi All, I am new to Solr and from initial reading i am quite convinced Solr will be of great help. Can anyone help in making that decision. Usecase: 1. I will have PDF,Word docs generated daily/weekly ( lot of them ) which kinds of get overwritten frequently. 2. I have a dictionary kind of thin

Re: What contributes to disk IO?

2014-03-26 Thread Walter Underwood
A merge requires reading 100% of two or more segments, and writing all the non-deleted docs into a brand-new segment. So generally, a whole bunch of sequential reads and writes. This Mike McCandless post from 2011 has lovely visualizations of merge behavior, plus some info about the number of w

Re: Fwd: Solr | Merge Factor | Implementation

2014-03-26 Thread Shawn Heisey
On 3/26/2014 8:23 AM, Usman Parvez wrote: Hi Team, I'm setting up merge factor for my Solr index. Defined following config values in solrConfig.xml. 10 10 3 Number of FDT, FDX and Doc files in index folder is still 22 each. Am I missing something? I just wan

Re: leaks in solr

2014-03-26 Thread Harish Agarwal
Thanks for the help -- after fortuitously looking at a separate thread: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201403.mbox/%3CCAJYe0M4qNKzqT4gB-qib0T6%3DY1KYr7vKcNEYDHWH1MnMoCLtYw%40mail.gmail.com%3E I upgraded to 7u25 and all is well! On a separate note, you'd mentioned DocVa

Re: MergingSolrIndexes not supported by SolrCloud?why?

2014-03-26 Thread Shawn Heisey
On 3/26/2014 3:14 AM, rulinma wrote: MergingSolrIndexes: http://192.168.22.32:8080/solr/admin/cores?action=mergeindexes&core=collection_cms2&indexDir=/home/solr/mrl/data/index I use it in local that supported well, but in solrCloud(3 machine) that not work well. can anyone give me some advice.

Re: What contributes to disk IO?

2014-03-26 Thread Otis Gospodnetic
Lucene segment merges cause both reads and writes. If you look at SPM, you'll see the number of index files and the number of segments, which will give you an idea what's going on at that level. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http:

Re: What contributes to disk IO?

2014-03-26 Thread Shawn Heisey
On 3/25/2014 6:12 PM, Software Dev wrote: What are the main contributing factors for Solr Cloud generating a lot of disk IO? A lot of reads? Writes? Insufficient RAM? I would think if there was enough disk cache available for the whole index there would be little to no disk IO. Toke's answer

SolrIndexSearch slow doc retrieval problems

2014-03-26 Thread jfeist
I'm writing a search extension that is needing to write results to an external cache. So Solr will only return a handful of results to the user, but it puts all the rest into this cache where the user can do other operations on the results if they need to. The only thing that needs to go into thi

Re: leaks in solr

2014-03-26 Thread Shawn Heisey
On 3/25/2014 4:06 PM, harish.agarwal wrote: I'm having a very similar issue to this currently on 4.6.0 (large java.lang.ref.Finalizer usage, many open file handles to long gone files) -- were you able to make any progress diagnosing this issue? A few questions: Are you using any contrib or thi

How to exclude a mimetype in tika?

2014-03-26 Thread eShard
Good afternoon, I'm using solr 4.0 Final I need movies "hidden" in zip files that need to be excluded from the index. I can't filter movies on the crawler because then I would have to exclude all zip files. I was told I can have tika skip the movies. the details are escaping me at this point. How d

How do you reliably update and restart nodes in a SolrCloud?

2014-03-26 Thread Rich Mayfield
I am trying to sort out what updating a relatively simple SolrCloud 4.1 deployment (one shard, 500 collections, 2 replicas each collection) looks like. From experience and from reading other accounts, just restarting both Solr instances is a coin toss - both instances get tied up trying to rec

More Robust Search Timeouts (to Kill Zombie Queries)?

2014-03-26 Thread Salman Akram
With reference to this threadI wanted to know if there was any response to that or if Chris Harris himself can comment on what he ended up doing, that would be gr

Fwd: Solr | Merge Factor | Implementation

2014-03-26 Thread Usman Parvez
Hi Team, I'm setting up merge factor for my Solr index. Defined following config values in solrConfig.xml. > 10 > 10 > > 3 > Number of FDT, FDX and Doc files in index folder is still 22 each. Am I missing something? I just want to enable auto merge with mergeFacto

Re: Wildcard use for partial matches

2014-03-26 Thread Ahmet Arslan
Hi Pratpor, With recently added (to trunk) complex phrase query parser ( SOLR-1604 ) it is possible to use : "Some va*" types of queries. But you need to use solr-trunk for this. Or you need to wait solr 4.8 Ahmet On Wednesday, March 26, 2014 3:14 PM, pratpor wrote: I am having some strange

Wildcard use for partial matches

2014-03-26 Thread pratpor
I am having some strange issue with solr that I found nowhere on the web. I was looking for partial matches for some fields in solr. I know of the "*" wildcard which can be used for the same. Now I want to search in some field with value say "Some value". Now the problem is: When I do: field:val*

Re: Solr Cloud Shows the server is down

2014-03-26 Thread Erick Erickson
Could well be you're not doing hard commits (openSearcher=true|false doesn't matter), so your transaction log is getting replayed on startup. See: http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup and http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-comm

Re: charset encoding

2014-03-26 Thread Antoine LE FLOC'H
Thank you for this. This work around using "ie" works great. However this is called fairly early by Solr, before the request handlers are called. So it cannot be added be used by the solrconfig. Anybody has an idea, how we can force "ie" all the time by simply changing some Solr settings ? (not c

Solr Cloud Shows the server is down

2014-03-26 Thread Sathya
Hi Friends, This is sathya. I am new to SolrCloud. I setup a Solrcloud in 5 different machines. And i indexed the data nearly 5 crores. When i restart the tomcat, its took more time to start(nearly 20 to 30 min). After starting tomcat, my solrcloud graph shows the server is down. I dont know why.

Re: Bug with OpenJDK on Ubuntu - affects Solr users

2014-03-26 Thread Markus Jelsma
Hi - as far as i know it has never been a good idea to run Lucene on OpenJDK 6 at all. Only either Oracle Java 6 or higher or OpenJDK 7. On Wednesday, March 26, 2014 06:54:41 PM Nigel Sheridan-Smith wrote: > Hi all, > > This is a bit of a 'heads up'. We have recently come across this bug on > U

Re: search WITH or WITHOUT accents (selection at runtime) + highlights

2014-03-26 Thread Ahmet Arslan
Hi Elfu, I found where i saw the request, Robert says "...why not propose a modification to the highlighter so it can highlight field A with field B's stored value?" in  https://issues.apache.org/jira/browse/LUCENE-3415 If you want to create a jira issue, anyone can create an account. Regarding

MergingSolrIndexes not supported by SolrCloud?why?

2014-03-26 Thread rulinma
MergingSolrIndexes: http://192.168.22.32:8080/solr/admin/cores?action=mergeindexes&core=collection_cms2&indexDir=/home/solr/mrl/data/index I use it in local that supported well, but in solrCloud(3 machine) that not work well. can anyone give me some advice. thanks. -- View this message in cont

Re: search WITH or WITHOUT accents (selection at runtime) + highlights

2014-03-26 Thread Ahmet Arslan
Hi, Not out of the box, I remember this request came before, you could create a jira, If it gets attention, it would be added as a new feature. Highlighting requires stored=true, basically you want to highlight on fieldX, using stored values of fieldY. Where fieldX and fieldY are copy fields of

Re: Multiple Languages in Same Core

2014-03-26 Thread Liu Bo
Hi Jeremy There're a lot of multi language discussions, two main approaches 1. like yours, a language is one core 2. all in one core, different language has it's own field. We have multi-language support in a single core, each multilingual field has it's own suffix such as name_en_US. We custom

Re: search WITH or WITHOUT accents (selection at runtime) + highlights

2014-03-26 Thread elfu
HI, let's say i have to index 10.000.000 documents (content from file documents) .. in this case storing content twice can be a problem... (or not if i can use external storage for content - the same content file) As i understand, the highlighter process use text from "hl.fl" field using this

Bug with OpenJDK on Ubuntu - affects Solr users

2014-03-26 Thread Nigel Sheridan-Smith
Hi all, This is a bit of a 'heads up'. We have recently come across this bug on Ubuntu with OpenJDK: https://bugs.launchpad.net/ubuntu/+source/openjdk-6/+bug/1295987 Basically, finalizers are not being run, so effectively all of the commits written in SolrIndexWriter are not Garbage Collected.

Re: What contributes to disk IO?

2014-03-26 Thread Toke Eskildsen
On Wed, 2014-03-26 at 01:12 +0100, Software Dev wrote: > What are the main contributing factors for Solr Cloud generating a lot > of disk IO? > > A lot of reads? Writes? Insufficient RAM? Searching is heavy random I/O reads, indexing is bulk reads and writes. > I would think if there was enough

Re: AND not as a boolean operator in Phrase

2014-03-26 Thread abhishek jain
Hi Jack, You are right, i am using 'and' as a stop word in both indexing and query, Should i use it only during indexing? thanks On Tue, Mar 25, 2014 at 11:09 PM, Jack Krupansky wrote: > What does your field type analyzer look like? > > I suspect that you have a stop filter which cause "and"