Re: get content is put in the index queue but is not committed

2013-02-21 Thread Miguel
Thanks Cris I'm going to see both UpdateLog and RealTimeGetComponent classes, but I not sure if I could use them because I'm working with apache solr version 1.4.1, (I know is older). Anyway I'll tell you my problem. I am developing a custom class extend from UpdateRequestProcessorFactory.

Solr as local service for .NET desktop app

2013-02-21 Thread Knacktus
I need some advanced search features for a desktop application. The application is a .NET (C#) application, so I can't use Lucene and as I'm not sure about the future of Lucene.NET I consider using Solr (with SolrNET). As I need a cache for the desktop app anyway it seems to be a good opportunity

Re: DIH deleting documents

2013-02-21 Thread cveres
I should also add that some of the books don't have chapters, so the query won't succeed for these books. But in this case I expected that the document won't be added at all .. rather than first added then deleted (which I am now suspecting is the case). It would be very helpful if I could see a li

solr 4 fragmentsBuilder and highlightMultiTerm

2013-02-21 Thread cmd.ares
how to config the solrconfig.xml to open fragmentsBuilder and highlightMultiTerm on 4.0 and 4.1 i read the documnet on wiki but i don't know where the snippet should be placed. and how to call by url path thanks -- View this message in context: http://lucene.472066.n3.nab

Re: How do I create two collections on the same cluster?

2013-02-21 Thread Shawn Heisey
On 2/21/2013 9:50 PM, Shankar Sundararaju wrote: I am using Solr 4.1. I created collection1 consisting of 2 leaders and 2 replicas (2 shards) at boot time. After the cluster is up, I am trying to create collection2 with 2 leaders and 2 replicas just like collection1. I am using following collec

How do I create two collections on the same cluster?

2013-02-21 Thread Shankar Sundararaju
I am using Solr 4.1. I created collection1 consisting of 2 leaders and 2 replicas (2 shards) at boot time. After the cluster is up, I am trying to create collection2 with 2 leaders and 2 replicas just like collection1. I am using following collections API for that: http://localhost:7575/solr/adm

Re: Is it possible to manually select a shard leader in a running SolrCloud?

2013-02-21 Thread Mark Miller
The leader doesn't really do a lot more work than any of the replicas, so I don't think it's likely that important. If someone starts running into problems, that's usually when we start looking for solutions. - Mark On Feb 21, 2013, at 10:20 PM, "Vaillancourt, Tim" wrote: > I sent this reques

RE: Is it possible to manually select a shard leader in a running SolrCloud?

2013-02-21 Thread Vaillancourt, Tim
I sent this request to "ServerA" in this case, which became the leader of all shards. As far as I know you're supposed to issue this call to just one server as it issues the calls to the other leaders/replicas in the background, right? I am expecting the single collections API call to spread the

Re: Index optimize takes more than 40 minutes for 18M documents

2013-02-21 Thread Yandong Yao
Thans Walter for info, we will disable optimize then and do more testing. Regards, Yandong 2013/2/22 Walter Underwood > That seems fairly fast. We index about 3 million documents in about half > that time. We are probably limited by the time it takes to get the data > from MySQL. > > Don't opti

Re: Solr splitting my words

2013-02-21 Thread Jack Krupansky
The issue may simply be that your indexed data has the mixed case and your query has only lower case. So, the suggested change won't affect the query itself, but will cause the indexed data to be indexed differently. -- Jack Krupansky -Original Message- From: scallawa Sent: Thursday,

Re: Document update question

2013-02-21 Thread Shawn Heisey
On 2/21/2013 10:00 AM, Jack Park wrote: Interesting you should say that. Here is my solrj code: public Solr3Client(String solrURL) throws Exception { server = new HttpSolrServer(solrURL); // server.setParser(new XMLResponseParser()); } I cannot reca

Re: How can i instruct the Solr/ Solr Cell to output the original HTML document which was fed to it.?

2013-02-21 Thread Chris Hostetter
: Hi everyone, i am new to solr technology and not getting a way to get back : the original HTML document with Hits highlighted into it. what : configuration and where i can do to instruct SolrCell/ Tika so that it does : not strips down the tags of HTML document in the content field. I _think_ w

Re: can i install new SOLR 4.1 as slaver(3.3 Master)

2013-02-21 Thread michaelweica
thanks we do have 1 master , 5 slave servers. and we use slave as production server. we just update master index file when we have new contents now our index file almost 88G, the server just 1 core, 8G ram,JVM: Xmx60964M -Xms1024M it's easy out of memory so i plan to deploy new server to

Re: can i install new SOLR 4.1 as slaver(3.3 Master)

2013-02-21 Thread Mingfeng Yang
I cannot give an affirmative answer. But I am thinking that it would have potential problem, as the index format in 3.3 and 4.1 are slightly different. Why don't you upgrade to 4.1? The only thing you need to do is 1. install solr 4.1 2.1 copy all related config files from 3.3 2.2 back up the in

Re: Is it possible to manually select a shard leader in a running SolrCloud?

2013-02-21 Thread Upayavira
Which of your three hosts did you point this request at? Upayavira On Thu, Feb 21, 2013, at 09:13 PM, Vaillancourt, Tim wrote: > Correction, I used this curl: > > curl -v > 'http://:8983/solr/admin/collections?action=CREATE&name=test&numShards=3&replicationFactor=2&maxShardsPerNode=2' > > So 3

RE: Is it possible to manually select a shard leader in a running SolrCloud?

2013-02-21 Thread Vaillancourt, Tim
Correction, I used this curl: curl -v 'http://:8983/solr/admin/collections?action=CREATE&name=test&numShards=3&replicationFactor=2&maxShardsPerNode=2' So 3 instances, 3 shards, 2 replicas per shard. ServerA becomes leader of all 3 shards in 4.1 with this call. Tim Vaillancourt -Original M

Re: Matching an exact word

2013-02-21 Thread Sebastian Saip
And keep in mind you do need quotes around your searchTerm if it consists of multiple words - q=text_exact_field:"your_unquoted_query" otherwise Solr will interpret "two words" as: "exact_field:two defaultfield:words" (Maybe not directly applicable for your problem Kristian, but I just want to men

Re: Matching an exact word

2013-02-21 Thread SUJIT PAL
You could also do this outside Solr, in your client. If your query is surrounded by quotes, then strip away the quotes and make q=text_exact_field:your_unquoted_query. Probably better to do outside Solr in general keeping in mind the upgrade path. -sujit On Feb 21, 2013, at 12:20 PM, Van Tasse

RE: Matching an exact word

2013-02-21 Thread Van Tassell, Kristian
Thank you. So essentially I need to write a custom query parser (extending upon something like the QParser)? -Original Message- From: Upayavira [mailto:u...@odoko.co.uk] Sent: Thursday, February 21, 2013 12:22 PM To: solr-user@lucene.apache.org Subject: Re: Matching an exact word Solr

Re: If we Open Source our platform, would it be interesting to you?

2013-02-21 Thread Jack Park
Marcelo In some sense, it sounds like you are aiming at building a topic map of all your resources. Jack On Thu, Feb 21, 2013 at 11:54 AM, Marcelo Elias Del Valle wrote: > Hello David, > > First of all, thanks for answering! > > 2013/2/21 David Quarterman > >> Looked through your site and

Re: DIH deleting documents

2013-02-21 Thread cveres
Hi Gora and Arcadius, Thanks for your help. I'll try and answer both your questions here. I am interested in three database tables. "Book" contains information about books, "page" has the content of each book page by page, and "chapter" contains the title of each chapter in every book, and the pa

Re: If we Open Source our platform, would it be interesting to you?

2013-02-21 Thread Marcelo Elias Del Valle
Hello David, First of all, thanks for answering! 2013/2/21 David Quarterman > Looked through your site and the framework looks very powerful as an > aggregator. We do a lot of data aggregation from many different sources in > many different formats (XML, JSON, text, CSV, etc) using RDBMS a

Re: Combining Solr score with customized user ratings for a document

2013-02-21 Thread Chris Hostetter
: With this approach now I can boost (i.e. multiply Solr's score by a factor) : the results of any query by doing something like this: : http://localhost:8080/solr/Prueba/select_test?q={!boost : b=rating(usuario1)}text:grapa&fl=score : : Where 'rating' is the name of my function. : : Unfortunate

RE: Is it possible to manually select a shard leader in a running SolrCloud?

2013-02-21 Thread Vaillancourt, Tim
Thanks Mark, The real driver for me wanting to promote a different leader is when I create a new Collection via the Collections API across a multi-server SolrCloud, the leader of each shard is always the same host, so you're right that I'm tackling the wrong problem with this request, although

can i install new SOLR 4.1 as slaver(3.3 Master)

2013-02-21 Thread michaelweica
Hi , our SOLR master version is 3.3, can i install new box SOLR 4.1 as slaver, and replication from master data. thanks -- View this message in context: http://lucene.472066.n3.nabble.com/can-i-install-new-SOLR-4-1-as-slaver-3-3-Master-tp4041976.html Sent from the Solr - User mailing list a

Re: Slaves always replicate entire index & Index versions

2013-02-21 Thread Amit Nithian
Sounds good I am trying the combination of my patch and 4413 now to see how it works and will have to see if I can put unit tests around them as some of what I thought may not be true with respect to the commit generation numbers. For your issue above in your last post, is it possible that there w

Re: Is their a way to remove the unwanted characters from solr index

2013-02-21 Thread Chris Hostetter
: I have a field in which I have strings with unwanted character like : \n\r\n\n these kind, I wanted to know is their any why I can remove : these...actually I had data stored in html format in the sql database : column which I had to index in solr...using HTML stripe I had removed the : HTML t

Re: get content is put in the index queue but is not committed

2013-02-21 Thread Chris Hostetter
: Anybody know how-to get content is put in the index queue but is not : committed? i'm guessing you are refering to uncommited documents in the transaction log? Take a look at the UpdateLog class, and how it's used by the RealTimeGetComponent. If you provide more details as to what you end

Re: DIH deleting documents

2013-02-21 Thread Arcadius Ahouansou
Hi Csaba. Would you mind posting your DIHconfig/data-config.xml and the command you use for the import? Thanks. Arcadius. On 21 February 2013 17:55, Gora Mohanty wrote: > On 21 February 2013 19:30, cveres wrote: >> Thanks Gora, >> >> Sorry I might not have been sufficiently clear. >> >> I st

Re: Solr UIMA

2013-02-21 Thread Chris Hostetter
: Subject: Solr UIMA : References: <5123b218.7050...@juntadeandalucia.es> : In-reply-to: <5123b218.7050...@juntadeandalucia.es> https://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing

Re: Matching an exact word

2013-02-21 Thread Upayavira
Solr will only match on the terms as they are in the index. If it is stemmed in the index, it will match that. If it isn't, it'll match that. All term matches are (by default at least) exact matches. Only with stemming you are doing an exact match against the stemmed term. Therefore, there really

Re: splitting big, existing index into shards

2013-02-21 Thread Upayavira
You can split an index using the MultiPassIndexSplitter, which is in Lucene contrib. However, it won't use the same algorithm for assigning documents to shards, which means the indexes won't work with a SolrCloud setup. A splitter that uses the same split technique but uses the shard assignment al

Matching an exact word

2013-02-21 Thread Van Tassell, Kristian
I'm trying to match the word "created". Given that it is surrounded by quotes, I would expect an exact match to occur, but instead the entire stemming results show for words such as create, creates, created, etc. q="created"&wt=xml&rows=1000&qf=text&defType=edismax If I copy the text field to a

splitting big, existing index into shards

2013-02-21 Thread zqzuk
Hi I have built a 300GB index using lucene 4.1 and now it is too big to do queries efficiently. I wonder if it is possible to split it into shards, then use SolrCloud configuration? I have looked around the forum but was unable to find any tips on this. Any help please? Many thanks! -- View t

Re: Solr splitting my words

2013-02-21 Thread scallawa
I tried playing with the analyzer before posting and wasn't sure how to interpret it. Field type: text Field value index: womens-mcmurdo-ii-bootsthis is based on the info that is in the field Field value query: mcmurdo results I only got one match in the index analyzer org.apache.solr.analys

Re: DIH deleting documents

2013-02-21 Thread Gora Mohanty
On 21 February 2013 19:30, cveres wrote: > Thanks Gora, > > Sorry I might not have been sufficiently clear. > > I start with an empty index, then add documents. > 9000 are added and 6000 immediately deleted again, leaving 3000. > I assume this can only happen with duplicate IDs, but that should no

Re: Index optimize takes more than 40 minutes for 18M documents

2013-02-21 Thread Walter Underwood
That seems fairly fast. We index about 3 million documents in about half that time. We are probably limited by the time it takes to get the data from MySQL. Don't optimize. Solr automatically merges index segments as needed. Optimize forces a full merge. You'll probably never notice the differen

Re: Threads running while querrying

2013-02-21 Thread Ido Kissos
I get 2 second response time in average. Any config / hardware change suggestions for my usecase - low qps rate? I would say more shards on the same node, but there would be the cache diminution disadvantage On Wednesday, February 20, 2013, Walter Underwood wrote: > In production, you should hav

"synonym replacement" in AnalyzingSuggester?

2013-02-21 Thread Sebastian Saip
I'm using the new AnalyzingSuggester (my code is available on http://pastebin.com/tN9yXHB0) and I got the synonyms "whisky,whiskey" (they are bi-directional) So whether the user searches for whiskey or whisky, I want to retrieve all documents that have any of them. However, for autosuggest, I wou

Re: How to change the index dir in Solr 4.1

2013-02-21 Thread Mingfeng Yang
How about passing -Dsolr.data.dir=/ur/data/dir in the command line to java when you start Solr service. On Thu, Feb 21, 2013 at 9:05 AM, chamara wrote: > Yes that is what i am doing now? I taught this solution is not elegant for > a > deployment? Is there any other way to do this from the Solr

Re: How to change the index dir in Solr 4.1

2013-02-21 Thread chamara
Yes that is what i am doing now? I taught this solution is not elegant for a deployment? Is there any other way to do this from the SolrConfig.xml? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-change-the-index-dir-in-Solr-4-1-tp4041891p4041950.html Sent from the So

Re: Document update question

2013-02-21 Thread Jack Park
Interesting you should say that. Here is my solrj code: public Solr3Client(String solrURL) throws Exception { server = new HttpSolrServer(solrURL); // server.setParser(new XMLResponseParser()); } I cannot recall why I commented out the setParser line;

Re: Document update question

2013-02-21 Thread Timothy Potter
Weird - the only difference I see is that we us XML vs. JSON, but otherwise, doing the following works for us: VALU1 VALU2 Result would be: VALU1 VALU2 On Thu, Feb 21, 2013 at 9:44 AM, Jack Park wrote: > I am using 4.1. I was not aware of that link. In the absence of being > able to do

Re: How to change the index dir in Solr 4.1

2013-02-21 Thread Timothy Potter
Have you tried leaving: ${solr.data.dir:} in solrconfig.xml and then setting the data dir for each core in the solr.xml, i.e. On Thu, Feb 21, 2013 at 7:13 AM, chamara wrote: > I am having 5 shards in one machine using the new one collection multiple > cores method. I am trying to change the in

Re: Document update question

2013-02-21 Thread Jack Park
I am using 4.1. I was not aware of that link. In the absence of being able to do partial updates to multi-valued fields, I just punted to delete and reindex. I'd like to see otherwise. Many thanks Jack On Thu, Feb 21, 2013 at 8:13 AM, Timothy Potter wrote: > Hi Jack, > > There was a bug for this

Re: Is their a way in which I can make spell suggestion dictionary build on specific fileds

2013-02-21 Thread Alexandre Rafalovitch
AnalyzingSuggester might also be worth having a look at (requires some Googling and SO reading to get it right for now). Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from

Re: Document update question

2013-02-21 Thread Timothy Potter
Hi Jack, There was a bug for this fixed for 4.1 - which version are you on? I remember this b/c I was on 4.0 and had to upgrade for this exact reason. https://issues.apache.org/jira/browse/SOLR-4134 Tim On Wed, Feb 20, 2013 at 9:16 PM, Jack Park wrote: > From what I can read about partial upda

Re: Is their a way in which I can make spell suggestion dictionary build on specific fileds

2013-02-21 Thread Jack Krupansky
Yes, each spellchecker (or "dictionary") in your spellcheck search component has a "field" parameter to specify the field to be used to generate the dictionary index for that spellchecker: spell See the Solr example solrconfig.xml and search for name="spellchecker">. Also see: http://wiki.ap

Re: SolrCloud as my primary data store

2013-02-21 Thread Timothy Potter
With Solr's atomic updates, optimistic locking, update log, openSearcher=false on commits, etc. you can definitely do this. Biggest question in my mind is whether you're willing to accept Solr's emphasis on consistency vs. write-availability? With a db like Cassandra, you can achieve better write-

Re: Solr splitting my words

2013-02-21 Thread Jack Krupansky
The word splitting is caused by "splitOnCaseChange: 1". Change that "1" to "0" and completely reindex your data. -- Jack Krupansky -Original Message- From: scallawa Sent: Thursday, February 21, 2013 7:47 AM To: solr-user@lucene.apache.org Subject: Solr splitting my words Let me start

Re: Solr splitting my words

2013-02-21 Thread Timothy Potter
Feed your data into the Analysis form to see the transformations taking place. Navigate to the Solr admin console, select your collection name on the left (e.g. collection1). Click on Analysis link. I suspect it's the WordDelimiterFilterFactory that is not doing what you expect, which you can fine-

Solr splitting my words

2013-02-21 Thread scallawa
Let me start out by saying that I am just learning Solr now. Solr is splitting a word and I am not sure why. The word is mcmurdo. If I do a search for McMurdo it picks it up. If I do a search for just murdo it will also pick it up. If I search for mcmurdo, I get nothing. "womens-mcmurdo-ii-bo

Re: SOLR4 SAN vs Local Disk?

2013-02-21 Thread chamara
Thanks Shawn for the Input, I could actually get RAID10's. -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR4-SAN-vs-Local-Disk-tp4041299p4041895.html Sent from the Solr - User mailing list archive at Nabble.com.

How to change the index dir in Solr 4.1

2013-02-21 Thread chamara
I am having 5 shards in one machine using the new one collection multiple cores method. I am trying to change the index directory, but if i hard code that in the SolrConfig.xml , the index dir does not change for other cores and each core tries to fight over it and ends up as a deadlock. Is there

Re: SolrCloud vs. distributed suggester

2013-02-21 Thread Mark Miller
It's not really any different in SolrCloud as the pre-cloud - distrib search is still the same code done the same way by and large. shards.qt should be just as valid an option as forcing a query component. - Mark On Feb 21, 2013, at 7:56 AM, AlexeyK wrote: > In pre-cloud version of SOLR it wa

Re: DIH deleting documents

2013-02-21 Thread cveres
Thanks Gora, Sorry I might not have been sufficiently clear. I start with an empty index, then add documents. 9000 are added and 6000 immediately deleted again, leaving 3000. I assume this can only happen with duplicate IDs, but that should not be possible! So I wanted to get a list of deleted do

Re: multiple facet.prefix for the same facet.field VS multiple facet.query

2013-02-21 Thread Bill Au
Never mind. I just realized the difference between the two. Sorry for the noise. Bill On Thu, Feb 21, 2013 at 8:42 AM, Bill Au wrote: > There have been requests for supporting multiple facet.prefix for the same > facet.field. There is an open JIRA with a patch: > > https://issues.apache.org

multiple facet.prefix for the same facet.field VS multiple facet.query

2013-02-21 Thread Bill Au
There have been requests for supporting multiple facet.prefix for the same facet.field. There is an open JIRA with a patch: https://issues.apache.org/jira/browse/SOLR-1351 Wouldn't using multiple facet.query achieve the same result? I mean something like: facet.query=lastName:A*&facet.query=la

SolrCloud vs. distributed suggester

2013-02-21 Thread AlexeyK
In pre-cloud version of SOLR it was necessary to pass shards and shards.qt parameters in order to make /suggest handler work standalone. How should it work in SolrCloud? SpellCheckComponent skips the distributed stage of processing and thus I get suggestions only when I force distrib=false mode. Se

Re: How to retrive all terms with their frequency in that website.

2013-02-21 Thread Alexander Golubowitsch
I guess the Term Vector Component might satisfy all or most of what you're trying to do: http://wiki.apache.org/solr/TermVectorComponent On 21.02.2013 12:58, search engn dev wrote: I have indexed data of 10 websites in solr. Now i want to dump data of each website with following format : [Term

Re: How to retrive all terms with their frequency in that website.

2013-02-21 Thread Miguel
Hi Look up the luke page in admin Solr .. /admin/luke?show=index That page show topTerms of terms, so I suppose is possible get frecuency all terms. El 21/02/2013 12:58, search engn dev escribió: I have indexed data of 10 websites in solr. Now i want to dump data of each website with follo

How to retrive all terms with their frequency in that website.

2013-02-21 Thread search engn dev
I have indexed data of 10 websites in solr. Now i want to dump data of each website with following format : [Terms,Frequency of terms in that website ,IDF] Can i do this with solr admin, or i need to write any script for that? -- View this message in context: http://lucene.472066.n3.nabble.co

RE: If we Open Source our platform, would it be interesting to you?

2013-02-21 Thread David Quarterman
Hi Marcelo, Looked through your site and the framework looks very powerful as an aggregator. We do a lot of data aggregation from many different sources in many different formats (XML, JSON, text, CSV, etc) using RDBMS as the main repository for eventual SOLR indexing. A 'one-stop-shop' for all

Re: Slaves always replicate entire index & Index versions

2013-02-21 Thread raulgrande83
Thanks for the patch, we'll try to install these fixes and post if replication works or not. I renamed 'index.' folders to just 'index' but it didn't work. These lines appeared in the log: INFO: Master's generation: 64594 21-feb-2013 10:42:00 org.apache.solr.handler.SnapPuller fetchLatestIndex I

Re: Solr UIMA

2013-02-21 Thread Tommaso Teofili
Hi Bart, I think the only way you can do that is by reindexing, or maybe by just doing a dummy atomic update [1] to each of the documents (e.g. adding or changing a field of type 'ignored' or something like that) that weren't "tagged" by UIMA before. Regards, Tommaso [1] : http://wiki.apache.org

Re: Slaves always replicate entire index & Index versions

2013-02-21 Thread Amit Nithian
Thanks for the links... I have updated SOLR-4471 with a proposed solution that I hope can be incorporated or amended so we can get a clean fix into the next version so our operations and network staff will be happier with not having gigs of data flying around the network :-) On Thu, Feb 21, 2013

Re: DIH deleting documents

2013-02-21 Thread Gora Mohanty
On 21 February 2013 14:27, cveres wrote: > I am adding documents with data import handler from a mysql database. I > create a unique id for each document by concatenating a couple of fields in > the database. Every id is unique. > > After the import, over half the documents which were imported are

Re: Slaves always replicate entire index & Index versions

2013-02-21 Thread raulgrande83
Hi Amit, I have came across some JIRAs that may be useful in this issue: https://issues.apache.org/jira/browse/SOLR-4471 https://issues.apache.org/jira/browse/SOLR-4354 https://issues.apache.org/jira/browse/SOLR-4303 https://issues.apache.org/jira/browse/SOLR-4413 https://issues.apache.org/jira/br

Re: Slaves always replicate entire index & Index versions

2013-02-21 Thread Amit Nithian
A few others have posted about this too apparently and SOLR-4413 is the root problem. Basically what I am seeing is that if your index directory is not index/ but rather index. set in the index.properties a new index will be downloaded all the time because the download is expecting your index to be

DIH deleting documents

2013-02-21 Thread cveres
I am adding documents with data import handler from a mysql database. I create a unique id for each document by concatenating a couple of fields in the database. Every id is unique. After the import, over half the documents which were imported are deleted again, leaving me with less then half the