few and huge tlogs

2013-09-17 Thread YouPeng Yang
Hi According to http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup。 It explains that the tlog file will swith to a new when hard commit happened. However,my tlog shows different. tlog.003 5.16GB tlog.004 1.56GB tlog.002 610

Re: how to make sure all the index docs flushed to the index files

2013-09-17 Thread Shawn Heisey
On 9/17/2013 12:32 AM, YouPeng Yang wrote: > Hi >Another werid problem. >When we setup the autocommit properties, we suppose that the index > fille will created every commited.So that the size of the index files will > be large enough. We do not want to keep too many small files as [1]. >

Problem indexing windows files

2013-09-17 Thread Yossi Nachum
Hi, I am trying to index my windows pc files with manifoldcf version 1.3 and solr version 4.4. I create output connection and repository connection and started a new job that scan my E drive. Everything seems like it work ok but after a few minutes solr stop getting new files to index. I am seei

Scoring by document size

2013-09-17 Thread blopez
Hi all, I have some doubts about the Solr scoring function. I'm using all default configuration, but I'm facing a wired issue with the retrieved scores. In the schema, I'm going to focus in the only field I'm interested in. Its definition is: *

Re: dih delete doc per $deleteDocById

2013-09-17 Thread Shalin Shekhar Mangar
What is your question? On Tue, Sep 17, 2013 at 12:17 AM, andreas owen wrote: > i am using dih and want to delete indexed documents by xml-file with ids. i > have seen $deleteDocById used in > > data-config.xml: > url="file:///C:\ColdFusion10\cfusion\solr\solr\tkbintranet\docImportDelete.xml" >

Re: Re-Ranking results based on DocValues with custom function.

2013-09-17 Thread Mathias Lux
Hi! Thanks for the directions! I got it up and running with a custom ValueSourceParser: http://pastebin.com/cz1rJn4A and a custom ValueSource: http://pastebin.com/j8mhA8e0 It basically allows for searching for text (which is associated to an image) in an index and then getting the distance to a s

Re: Scoring by document size

2013-09-17 Thread Upayavira
Have you used debugQuery=true, or fl=*,[explain], or those various functions? It is possible to ask Solr to tell you how it calculated the score, which will enable you to see what is going on in each case. You can probably work it out for yourself then I suspect. Upayavira On Tue, Sep 17, 2013, a

Re: Scoring by document size

2013-09-17 Thread Mathias Lux
As the IDF values for A, B and C are minimal (couldn't get any worse than being in any document), the major part of your score comes most likely from the coord(..) part of scoring - which basically computes the overlap of the query and the document. If you want to have a stronger influence you can

Re: how soft-commit works

2013-09-17 Thread Erick Erickson
Here's a rather long blog post I wrote up that might help: http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Best, Erick On Mon, Sep 16, 2013 at 1:43 PM, Shawn Heisey wrote: > On 9/16/2013 7:01 AM, Matteo Grolla wrote: > > Can anyone explain me

Re: Dynamic row sizing for documents via UpdateCSV

2013-09-17 Thread Erick Erickson
Well, it's reasonably easy if you have empty columns, in the same order, for _all_ of the possible dynamic fields, but I really doubt you are that fortunate... It's especially ugly in that you have the different dynamic fields scattered around. How is the csv file generated? Could you force every

Re: dih delete doc per $deleteDocById

2013-09-17 Thread Andreas Owen
i would like to know how to get it to work and delete documents per xml and dih. On 17. Sep 2013, at 1:47 PM, Shalin Shekhar Mangar wrote: > What is your question? > > On Tue, Sep 17, 2013 at 12:17 AM, andreas owen wrote: >> i am using dih and want to delete indexed documents by xml-file with i

Re: Atomic commit across shards?

2013-09-17 Thread Erick Erickson
There are two things to think about here. 1> if you're issuing the commit manually (i.e. not relying on the settings in solrconfig.xml) then they are atomic. The call doesn't return until all the active nodes have seen the commit. 2> However, autocommits are usually time based. Since servers start

Re: few and huge tlogs

2013-09-17 Thread Erick Erickson
Probably because you're indexing a lot of documents very quickly. It's entirely reasonable to have much shorter autoCommit times, all that does is 1> truncate the transaction log 2> close the current segment 3> start a new segment. That should cut down your tlog files drastically. Try setting your

Re: how to make sure all the index docs flushed to the index files

2013-09-17 Thread Erick Erickson
Here's a blog about tlogs and commits: http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ And here's Mike's excellent segment merging blog http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html Best, Erick On Tue, Sep 17, 2

Re: Scoring by document size

2013-09-17 Thread Erick Erickson
This kind of artificial test is almost always misleading. Some approximations are used, in particular the length of the field is not stored as an exact number, so at various points some fields with slightly different lengths are "rounded" to the same number, thus the identical scores you're seeing.

Re: How to round solr score ?

2013-09-17 Thread Mamta Thakur
Hi , As per this post here http://grokbase.com/t/lucene/solr-user/131jzcg3q2/how-to-round-solr-score. I was able to use my custom fn in sort(defType=func&q=socialDegree(id,1)&fl=score,*&sort=score%20asc) - works, but can't facet on the same(defType=func&q=socialDegree(id,1)&fl=score,*&facet=tru

Re: spellcheck causing Core Reload to hang

2013-09-17 Thread Raheel Hasan
I think they should have it in RC0, because if you search in this forum at lucene, this issue is there since version 4.3 ! Regards, Raheel On Tue, Sep 17, 2013 at 5:58 PM, Erick Erickson wrote: > H, do we have a JIRA tracking this and does it seem like any fix will > get into 4.5? > > I thi

Re: spellcheck causing Core Reload to hang

2013-09-17 Thread Raheel Hasan
Check this thread: http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-td3192748i20.html This issue is there since 2011. On Tue, Sep 17, 2013 at 6:35 PM, Raheel Hasan wrote: > I think they shou

check which file/document cause solr to work hard

2013-09-17 Thread Yossi Nachum
Hi, I am trying to index my windows pc files with manifoldcf version 1.3 and solr version 4.4. Few minutes after I start the crawler job I see that tomcat process constantly consume 100% of one cpu (I have two cpu's). I check the thread dump in solr admin and saw that the following threads take

tlog after commit

2013-09-17 Thread Alejandro Calbazana
Quick question... Should I still see tlog files after a hard commit? I'm trying to test soft commit and hard commits and I was under the impression that tlog would be removed after a hard commit where, in the case of soft commits, I would still see them. Thanks, Al

Atomic updates with solr cloud in solr 4.4

2013-09-17 Thread Sesha Sendhil
Hi, I am using solr 4.4 in solr cloud configuration. When i try to 'set' a field in a document using the update request handler, I get a 'missing required field' error. However, when I send this query to the specific shard containing the document, the update succeeds. Is this a bug in solr 4.4 or

Atomic updates with solr cloud in solr 4.4

2013-09-17 Thread Sesha Sendhil Subramanian
Hi, I am using solr 4.4 in solr cloud configuration. When i try to 'set' a field in a document using the update request handler, I get a 'missing required field' error. However, when I send this query to the specific shard containing the document, the update succeeds. Is this a bug in solr 4.4 or

Re: Atomic updates with solr cloud in solr 4.4

2013-09-17 Thread Yonik Seeley
On Tue, Sep 17, 2013 at 10:47 AM, Sesha Sendhil Subramanian wrote: > I am using solr 4.4 in solr cloud configuration. When i try to 'set' a > field in a document using the update request handler, I get a 'missing > required field' error. Can you show the exact error message you get, and the updat

Re: How to round solr score ?

2013-09-17 Thread Chris Hostetter
: 'score' is a pseudo-field, i.e., it does not actually exist in : the index, which is probably why it cannot be faceted on. : Faceting on a rounded score seems like an unusual use : case. What requirement are you trying to address? agreed, more details would be helpful. FWIW: the only way avail

Re: Re-Ranking results based on DocValues with custom function.

2013-09-17 Thread Chris Hostetter
: It basically allows for searching for text (which is associated to an : image) in an index and then getting the distance to a sample image : (base64 encoded byte[] array) based on one of five different low level : content based features stored as DocValues. very cool. : So there one little tin

Re: Solr node goes down while trying to index records

2013-09-17 Thread Furkan KAMACI
Do you get that error only when indexing? 2013/9/17 neoman > Hello everyone, > one or more of the nodes in the solrcloud go down randomly when we try to > index data using solrj APIs. The nodes do recover. but when we try to index > back, they go down again > > Our configuration: > 3 shards > S

SolrCloud liveness problems

2013-09-17 Thread Vladimir Veljkovic
Hello there, we have following setup: SolrCloud 4.4.0 (3 nodes, physical machines) Zookeeper 3.4.5 (3 nodes, physical machines) We have a number of rather small collections (~10K or ~100K of documents), that we would like to load to all Solr instances (numShards=1, replication_factor=3), and a

Re: Atomic updates with solr cloud in solr 4.4

2013-09-17 Thread Sesha Sendhil Subramanian
curl http://localhost:8983/solr/search/update -H 'Content-type:application/json' -d ' [ { "id": "c8cce27c1d8129d733a3df3de68dd675!c8cce27c1d8129d733a3df3de68dd675", "link_id_45454" : {"set":"abcdegff"} } ]' I have two collections search and meta. I want to do an update in the sea

Solr node goes down while trying to index records

2013-09-17 Thread neoman
Hello everyone, one or more of the nodes in the solrcloud go down randomly when we try to index data using solrj APIs. The nodes do recover. but when we try to index back, they go down again Our configuration: 3 shards Solr 4.4. I see the following exceptions in the log file. <09/17/13 15:33:32:

Re: Solr node goes down while trying to index records

2013-09-17 Thread neoman
yes. the nodes go down while indexing. if we stop indexing, it does not go down. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-node-goes-down-while-trying-to-index-records-tp4090610p4090644.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Dynamic row sizing for documents via UpdateCSV

2013-09-17 Thread Utkarsh Sengar
Yeah I think the only way to go about it is via SolrJ. The csv file is generated by a pig job which computes the data to be loaded in solr. I think this is what I will endup doing: Load all the possible columns in the csv with a value of 0 if the value doesn't exist for a specific record. I was ju

Re: SolrCloud liveness problems

2013-09-17 Thread Mark Miller
On Sep 17, 2013, at 12:00 PM, Vladimir Veljkovic wrote: > Hello there, > > we have following setup: > > SolrCloud 4.4.0 (3 nodes, physical machines) > Zookeeper 3.4.5 (3 nodes, physical machines) > > We have a number of rather small collections (~10K or ~100K of documents), > that we would

Getting a query parameter in a TokenFilter

2013-09-17 Thread Isaac Hebsh
Hi everyone, We developed a TokenFilter. It should act differently, depends on a parameter supplied in the query (for query chain only, not the index one, of course). We found no way to pass that parameter into the TokenFilter flow. I guess that the root cause is because TokenFilter is a pure luce

Re: Stop zookeeper from batch

2013-09-17 Thread Furkan KAMACI
Are you looking for that: https://issues.apache.org/jira/browse/ZOOKEEPER-1122 16 Eylül 2013 Pazartesi tarihinde Prasi S adlı kullanıcı şöyle yazdı: > Hi, > We have setup solrcloud with zookeeper and 2 tomcats . we are using a batch > file to start the zookeeper, uplink config files and start to

Some text not indexed in solr4.4

2013-09-17 Thread Utkarsh Sengar
I have a copyField called allText with type text_general: https://gist.github.com/utkarsh2012/6167128#file-schema-xml-L68 I have ~100 documents which have the text: dyson and dc44 or dc41 etc. For example: "title": "Dyson DC44 Animal Digital Slim Cordless Vacuum" "description": "The DC44 Animal i

Re: How to round solr score ?

2013-09-17 Thread Gora Mohanty
On 17 September 2013 18:31, Mamta Thakur wrote: > Hi , > > As per this post here > http://grokbase.com/t/lucene/solr-user/131jzcg3q2/how-to-round-solr-score. > I was able to use my custom fn in > sort(defType=func&q=socialDegree(id,1)&fl=score,*&sort=score%20asc) - works, > but can't facet on the

Limits of Document Size at SolrCloud and Faced Problems with Large Size of Documents

2013-09-17 Thread Furkan KAMACI
Currently I hafer over 50+ millions documents at my index and as I mentiod before at another question I have some problems while indexing (jetty EOF exception) I know that problem may not be about index size but just I want to learn that is there any limit for document size at Solr that if I exceed

Re: Solr node goes down while trying to index records

2013-09-17 Thread Furkan KAMACI
Could you give some information about your jetty.xml and give more info about your index rate and RAM usage of your machines? 17 Eylül 2013 Salı tarihinde neoman adlı kullanıcı şöyle yazdı: > yes. the nodes go down while indexing. if we stop indexing, it does not go > down. > > > > -- > View this

Re: tlog after commit

2013-09-17 Thread Furkan KAMACI
Did you check here: http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ 17 Eylül 2013 Salı tarihinde Alejandro Calbazana adlı kullanıcı şöyle yazdı: > Quick question... Should I still see tlog files after a hard commit? > > I'm trying to test soft

Re: Problem indexing windows files

2013-09-17 Thread Furkan KAMACI
Firstly; This may not be a Solr related problem. Did you check the log file of Solr? Tika mayhave some circumstances at some kind of situations. For example when parsing HTML that has a base64 encoded image it may have some problems. If you find the correct logs you can detect it. On the other tak

Re: Some text not indexed in solr4.4

2013-09-17 Thread Utkarsh Sengar
To add to it, I see the exact problem with the queries: "nikon d7100", "nikon d5100", "samsung ps-we450" etc. Thanks, -Utkarsh On Tue, Sep 17, 2013 at 2:20 PM, Utkarsh Sengar wrote: > I have a copyField called allText with type text_general: > https://gist.github.com/utkarsh2012/6167128#file-sc

Re: Some text not indexed in solr4.4

2013-09-17 Thread Furkan KAMACI
On the other hand did you check here: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters what it says about MultiPhraseQuery? 18 Eylül 2013 Çarşamba tarihinde Furkan KAMACI adlı kullanıcı şöyle yazdı: > Hi; > > Did you run commit command? > > 18 Eylül 2013 Çarşamba tarihinde Utkarsh S

Re: SPLITSHARD failure right before publishing the new sub-shards

2013-09-17 Thread HaiXin Tie
Never mind. I figured it out. It was due to a NPE on the missing updateLog in solrconfig.xml. My solrconfig.xml is from an older Solr release, which doesn't have certain required sections, etc. After adding them to solrconfig.xml per this official doc, everything started to work. It'd be great if

Re: Some text not indexed in solr4.4

2013-09-17 Thread Furkan KAMACI
Hi; Did you run commit command? 18 Eylül 2013 Çarşamba tarihinde Utkarsh Sengar adlı kullanıcı şöyle yazdı: > To add to it, I see the exact problem with the queries: "nikon d7100", > "nikon d5100", "samsung ps-we450" etc. > > Thanks, > -Utkarsh > > > On Tue, Sep 17, 2013 at 2:20 PM, Utkarsh Seng

Re: Updated: CREATEALIAS does not work with more than one collection (Error 503: no servers hosting shard)

2013-09-17 Thread HaiXin Tie
Never mind. I figured it out. It was due to a NPE on the missing updateLog in solrconfig.xml. My solrconfig.xml is from an older Solr release, which doesn't have certain required sections, etc. After adding them to solrconfig.xml per this official doc, everything started to work. http://wiki.a

Re: Limits of Document Size at SolrCloud and Faced Problems with Large Size of Documents

2013-09-17 Thread Otis Gospodnetic
Hi 50m docs across 18 servers 48gb RAM ain't much. I doubt you are hitting any limits in lucene or solr. How heavy is your index rate? Otis Solr & ElasticSearch Support http://sematext.com/ On Sep 17, 2013 5:25 PM, "Furkan KAMACI" wrote: > Currently I hafer over 50+ millions documents at my in

Re: Some text not indexed in solr4.4

2013-09-17 Thread Jason Hellman
Utkarsh, Check to see if the value is actually indexed into the field by using the Terms request handler: http://localhost:8983/solr/terms?terms.fl=text&terms.prefix=d (adjust the prefix to whatever you're looking for) This should get you going in the right direction. Jason On Sep 17, 2013,

Querying a non-indexed field?

2013-09-17 Thread Scott Schneider
Hello, Is it possible to restrict query results using a non-indexed, stored field? e.g. I might index fewer fields to reduce the index size. I query on a few indexed fields, getting a small # of results. I want to restrict this further based on values from non-indexed, stored fields. I can

Re: Querying a non-indexed field?

2013-09-17 Thread Walter Underwood
No. --wunder On Sep 17, 2013, at 5:16 PM, Scott Schneider wrote: > Hello, > > Is it possible to restrict query results using a non-indexed, stored field? > e.g. I might index fewer fields to reduce the index size. I query on a few > indexed fields, getting a small # of results. I want to r

Re: how to make sure all the index docs flushed to the index files

2013-09-17 Thread YouPeng Yang
Hi Erick and Shawn Thanks a lot 2013/9/17 Erick Erickson > Here's a blog about tlogs and commits: > > http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ > > And here's Mike's excellent segment merging blog > > http://blog.mikemccandless.com/20

Solr SpellCheckComponent only shows results with certain fields

2013-09-17 Thread jazzy
I'm trying to get the Solr SpellCheckComponent working but am running into some issues. When I run .../solr/collection1/select?q=%3A&wt=json&indent=true These results are returned { "responseHeader": { "status": 0, "QTime": 1, "params": { "indent": "true", "q": "*:*",

Re: SolrCloud liveness problems

2013-09-17 Thread Mark Miller
SOLR-5243 and SOLR-5240 will likely improve the situation. Both fixes are in 4.5 - the first RC for 4.5 will likely come tomorrow. Thanks to yonik for sussing these out. - Mark On Sep 17, 2013, at 2:43 PM, Mark Miller wrote: > > On Sep 17, 2013, at 12:00 PM, Vladimir Veljkovic > wrote: >

FAcet with " " values are displayes in output

2013-09-17 Thread Prasi S
Hi , Im using solr 4.4 for our search. When i query for a keyword, it returns empty valued facets in the response *1* 1 I have also tried using facet.missing parameter., but no change. How can we handle this. Thanks, Prasi

how can I use DataImportHandler on multiple MySQL databases with the same schema?

2013-09-17 Thread Liu Bo
Hi all Our system has distributed MySQL databases, we create a database for every customer signed up and distributed it to one of our MySQL hosts. We currently use lucene core to perform search on these databases, and we write java code to loop through these databases and convert the data to luce

Re: how can I use DataImportHandler on multiple MySQL databases with the same schema?

2013-09-17 Thread Alexandre Rafalovitch
You can create multiple entities in DIH definition and they will all run. Means duplicating the mapping definition apart from dataSource name, but is doable. Alternatively, the configuration file is read on every call to DIH. You can edit file between different invocations or autogenerate differen

Re: how can I use DataImportHandler on multiple MySQL databases with the same schema?

2013-09-17 Thread Raymond Wiker
You can also define multiple dataimporthandlers in solrconfig.xml, each with their own data-config. On Wed, Sep 18, 2013 at 7:45 AM, Alexandre Rafalovitch wrote: > You can create multiple entities in DIH definition and they will all run. > Means duplicating the mapping definition apart from data

SORTING RESULTS BASED ON RELAVANCY

2013-09-17 Thread PAVAN
Hi, i am using fuzzy logic and it is giving exact results but i need to sort the results based on relavancy. Means closer match results comes first. anyone can help with this.. Regards, Pavan. -- View this message in context: http://lucene.472066.n3.nabble.com/SORTING-RESULTS-BASED-O