How to apply Semantic Search in Solr
Hello, I am working on an event listing and promotions website( http://allevents.in) and I want to apply semantic search on solr. For example, if someone search : "Musical Events in New York" So it would give me results such as : * Musical Night at ABC place * Concerts Events * Classical Music Event I mean all results should be Semantic to the Search_Query it should not give the results only based on "tf-idf". So can you please make me understand how do i proceed to apply Semantic Search in Solr. ( allevents.in) -- Regards, *Sohan Kalsariya*
Re: How to apply Semantic Search in Solr
And how would it know to give you those results? Obviously, you have some sort of magic/algorithm in your mind. Are you doing geographic location match, category match, synonyms match? We can't really help with generic questions. You still need to figure out what "semantic" means for you specifically. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Sat, Mar 8, 2014 at 4:27 PM, Sohan Kalsariya wrote: > Hello, > > I am working on an event listing and promotions website( > http://allevents.in) and I want to apply semantic search on solr. > For example, if someone search : > > "Musical Events in New York" > So it would give me results such as : > > * Musical Night at ABC place > * Concerts Events > * Classical Music Event > I mean all results should be Semantic to the Search_Query it should not > give the results only based on "tf-idf". So can you please make me > understand how do i proceed to apply Semantic Search in Solr. ( allevents.in) > > -- > Regards, > *Sohan Kalsariya*
organize folder inside Solr
Hello, I'm beginner in Apache Solr, My task is to organize folders inside the Solr I've read a bit about collections, cores, and all that, what I don't understand is why every document inside the collection is in XML or Json? how can I put my folder inside Solr, should I create another collection, and put my converted data (to xml) into it? please guide me I'm lost. Best regards. -- View this message in context: http://lucene.472066.n3.nabble.com/organize-folder-inside-Solr-tp4122207.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: organize folder inside Solr
Well, a couple of things: 1> Solr does NOT index documents in XML, that is just the input format. Well, one of the input formats. Internally there's a complex inverted index storage format. 2> What do you mean "organize into folders"? The common way is just to put them all into a single core and also index a field with the path to the file. You can then do things like "show all files in folder X" by adding an fq=filepath:"path/to/folder/x" to your query. Also look at PathTokenizerHierarchyFactory for interesting ways to get partial paths, in the above you'd use fq=filepath:"path/to" to get everything in the tree below "path/to".. But this really sounds like an XY problem. You've asked for information about cores without clearly stating what problem you're trying to solve. How are people intending to _use_ the search app you're going to build? Best, Erick On Sat, Mar 8, 2014 at 7:01 AM, blach wrote: > Hello, > I'm beginner in Apache Solr, > My task is to organize folders inside the Solr > I've read a bit about collections, cores, and all that, what I don't > understand is why every document inside the collection is in XML or Json? > how can I put my folder inside Solr, should I create another collection, and > put my converted data (to xml) into it? > please guide me I'm lost. > > Best regards. > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/organize-folder-inside-Solr-tp4122207.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to apply Semantic Search in Solr
You will need to implement a sematic/text classification/categorization algorithm and annotate each document with additional fields for the categories before presenting it to Solr. For a couple of examples, see: http://www.slideshare.net/lucenerevolution/text-classification-with-lucenesolr-apache-hadoop-and-libsvm http://www.slideshare.net/teofili/text-categorization-with-lucene-and-solr Or google for "text classification" or "text categorization" and "solr". -- Jack Krupansky -Original Message- From: Sohan Kalsariya Sent: Saturday, March 8, 2014 4:27 AM To: solr-user@lucene.apache.org Subject: How to apply Semantic Search in Solr Hello, I am working on an event listing and promotions website( http://allevents.in) and I want to apply semantic search on solr. For example, if someone search : "Musical Events in New York" So it would give me results such as : * Musical Night at ABC place * Concerts Events * Classical Music Event I mean all results should be Semantic to the Search_Query it should not give the results only based on "tf-idf". So can you please make me understand how do i proceed to apply Semantic Search in Solr. ( allevents.in) -- Regards, *Sohan Kalsariya*
Re: Solrj Backward Compatibility After 4.5.1
I've added a comment at that issue. Thanks; Furkan KAMACI 2014-03-07 21:30 GMT+02:00 Shawn Heisey : > On 3/7/2014 11:58 AM, Furkan KAMACI wrote: > > Hi; > > > > I have a cluster as SolrCloud of 4.5.1 When I use a Solrj version > greater > > than 4.5.1 I get an error when deleting a document via CloudSolrServer of > > Solrj. When I change the version to 4.5.1 as it works as usual. > > > > I know that I should use same versions to avoid compatibility issues. > > However I think that there should/may be backward compatibility between > > Solr 4.x versions. If not it would be nice to write down compatibility > list > > somewhere else. > > > > My question is that: What has changed after 4.5.1 so I can not delete a > > document with higher versions of Solrj when using CloudSolrServer? > > There was a bug for backwards compatibility problems in recent SolrJ. > > https://issues.apache.org/jira/browse/SOLR-5762 > > That bug is resolved as of 4.7, but recently it has come to our > attention that the problem is not entirely fixed, and appears to be a > bigger issue than we thought. > > I don't recall seeing a new bug filed yet. I don't fully understand the > latest problems, or I would have already filed a new bug. > > Thanks, > Shawn > >
Re: organize folder inside Solr
Well, for now, I just want to put some data (binary, PDF, JPEG, ... any) inside solr, should I put them by hand (copy/past) inside \solr-4.7.0\example\solr\collection1, or there another way to do it. thanks. Med. -- View this message in context: http://lucene.472066.n3.nabble.com/organize-folder-inside-Solr-tp4122207p4122219.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: organize folder inside Solr
You really have to back up and do your homework here. Please work through the tutorial below, it'll help clear up some of your confusion. For instance, you feed documents _to_ solr, after you've defined a schema, figured out your use-cases, etc. You don't just stick documents somewhere, turn Solr on and magically be able to search them. https://lucene.apache.org/solr/4_7_0/tutorial.html Best, Erick On Sat, Mar 8, 2014 at 8:44 AM, blach wrote: > Well, for now, I just want to put some data (binary, PDF, JPEG, ... any) > inside solr, > should I put them by hand (copy/past) inside > \solr-4.7.0\example\solr\collection1, > or there another way to do it. > > thanks. > Med. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/organize-folder-inside-Solr-tp4122207p4122219.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: What is mean by Index Searcher?
Hi; At this point I suggest you to read here: http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Thanks; Furkan KAMACI 2014-03-07 10:44 GMT+02:00 Alexandre Rafalovitch : > Some events close and reopen the searcher. Commit is the main one > during lifetime of Solr server. So, you can read this "until commit". > Of course, you have soft and hard commits with settings to reopen or > not reopen the searcher, so you may want to read up on that if you are > trying to understand this all completely. > > Regards, >Alex. > Personal website: http://www.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all > at once. Lately, it doesn't seem to be working. (Anonymous - via GTD > book) > > > On Fri, Mar 7, 2014 at 3:25 PM, search engn dev > wrote: > > Thanks Alex, > > > > But what is mean by "...lifetime of that searcher." Is is lifetime of any > > particular query or what.? > > > > Sorry but i am not able to understand this. :( > > > > > > > > -- > > View this message in context: > http://lucene.472066.n3.nabble.com/What-is-mean-by-Index-Searcher-tp4121898p4121912.html > > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: SolrCloud with Tomcat
Hi; Could you check here: http://lucene.472066.n3.nabble.com/Error-when-creating-collection-in-Solr-4-6-td4103536.html Thanks; Furkan KAMACI 2014-03-07 9:44 GMT+02:00 Vineet Mishra : > Hi > > I am installing SolrCloud with 3 External > Zookeeper(localhost:2181,localhost:2182,localhost:2183) and 2 > Tomcats(localhost:8181,localhost:8182) all available on a single > Machine(Just for getting started). > By Following these links > > http://myjeeva.com/solrcloud-cluster-single-collection-deployment.html > http://wiki.apache.org/solr/SolrCloudTomcat > > I have got the Solr UI on the machine pointing to > > http://localhost:8181/solr/#/~cloud > > In the Cloud Graph View it is coming with > > mycollection > | > |_ shard1 > |_ shard2 > > But both the shards are empty and showing no cores or replica. > > Following > http://myjeeva.com/solrcloud-cluster-single-collection-deployment.htmlblog > , > I have been successful till starting tomcat, > since after the section "Creating Collection, Shard(s), Replica(s) in > SolrCloud" I am facing the problem. > > Giving command to create replica for the shard using > > *curl > ' > http://localhost:8181/solr/admin/cores?action=CREATE&name=shard1-replica-2&collection=mycollection&shard=shard1 > < > http://localhost:8181/solr/admin/cores?action=CREATE&name=shard1-replica-2&collection=mycollection&shard=shard1 > >'* > > it is giving error > > > 400 name="QTime">137 > *Error CREATEing SolrCore 'shard1-replica-2': > 192.168.2.183:8182_solr_shard1-replica-2 is removed* > 400 > > > Has anybody went through this issue? > > Regards >
Re: Partial Counts in SOLR
The issue with timeallowed is you never know if it will return minimum amount of docs or not. I do want docs to be sorted based on date but it seems its not possible that solr starts searching from recent docs and stops after finding certain no. of docs...any other tweak? Thanks On Saturday, March 8, 2014, Chris Hostetter wrote: > > : Reason: In an index with millions of documents I don't want to know that > a > : certain query matched 1 million docs (of course it will take time to > : calculate that). Why don't just stop looking for more results lets say > : after it finds 100 docs? Possible?? > > but if you care about sorting, ie: you want the top 100 documents sorted > by score, or sorted by date, you still have to "collect" all 1 million > matches in order to know what the first 100 are. > > if you really don't care about sorting, you can use the "timAllowed" > option to tell the seraching method to do the best job it can in an > (approximated) limited amount of time, and then pretend that the docs > collected so far represent the total number of matches... > > > https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThetimeAllowedParameter > > > -Hoss > http://www.lucidworks.com/ > -- Regards, Salman Akram Project Manager - Intelligize NorthBay Solutions 410-G4 Johar Town, Lahore Off: +92-42-35290152 Cell: +92-302-8495621
Re: howto count total word amount of all documents in solr index?
Hi; Dou you want that: http://localhost:8983/solr/#/collection1/schema-browser?field=text_general Thanks; Furkan KAMACI 2014-03-07 10:48 GMT+02:00 cqlangyi : > hi there, > > > i have following questions, please help me out, very appreciate. > > say i have a field configured as "text_general" type, and indexed 3 pieces > content as documents. > 1. "today is a good day" > 2. "call your family every day" > 3. "come with me" > > > how could i count the total (even roughly) word amount in these 3 > documents, with the above the > result should be "13" at max or something a little less if the stopwords > enabled. > > > thanks a lot. > > > Cq > > > > > > > At 2014-03-07 16:12:17,solr-user-h...@lucene.apache.org wrote: > >Hi! This is the ezmlm program. I'm managing the > >solr-user@lucene.apache.org mailing list. > > > >I'm working for my owner, who can be reached > >at solr-user-ow...@lucene.apache.org. > > > >Acknowledgment: I have added the address > > > > cqlan...@163.com > > > >to the solr-user mailing list. > > > >Welcome to solr-user@lucene.apache.org! > > > >Please save this message so that you know the address you are > >subscribed under, in case you later want to unsubscribe or change your > >subscription address. > > > > > >--- Administrative commands for the solr-user list --- > > > >I can handle administrative requests automatically. Please > >do not send them to the list address! Instead, send > >your message to the correct command address: > > > >To subscribe to the list, send a message to: > > > > > >To remove your address from the list, send a message to: > > > > > >Send mail to the following for info and FAQ for this list: > > > > > > > >Similar addresses exist for the digest list: > > > > > > > >To get messages 123 through 145 (a maximum of 100 per request), mail: > > > > > >To get an index with subject and author for messages 123-456 , mail: > > > > > >They are always returned as sets of 100, max 2000 per request, > >so you'll actually get 100-499. > > > >To receive all messages with the same subject as message 12345, > >send a short message to: > > > > > >The messages should contain one line or word of text to avoid being > >treated as sp@m, but I will ignore their content. > >Only the ADDRESS you send to is important. > > > >You can start a subscription for an alternate address, > >for example "john@host.domain", just add a hyphen and your > >address (with '=' instead of '@') after the command word: > > > > > >To stop subscription for this address, mail: > > > > > >In both cases, I'll send a confirmation message to that address. When > >you receive it, simply reply to it to complete your subscription. > > > >If despite following these instructions, you do not get the > >desired results, please contact my owner at > >solr-user-ow...@lucene.apache.org. Please be patient, my owner is a > >lot slower than I am ;-) > > > >--- Enclosed is a copy of the request I received. > > > >Return-Path: > >Received: (qmail 15386 invoked by uid 99); 7 Mar 2014 08:12:16 - > >Received: from athena.apache.org (HELO athena.apache.org) > (140.211.11.136) > >by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Mar 2014 08:12:16 > + > >X-ASF-Spam-Status: No, hits=4.9 required=5.0 > > tests=HTML_MESSAGE,RCVD_IN_PSBL,SPF_PASS > >X-Spam-Check-By: apache.org > >Received-SPF: pass (athena.apache.org: domain of cqlangyi@163.comdesignates > >220.181.13.59 as permitted sender) > >Received: from [220.181.13.59] (HELO m13-59.163.com) (220.181.13.59) > >by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Mar 2014 08:12:10 > + > >DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; > > s=s110527; h=Date:From:Subject:MIME-Version:Message-ID; bh=KIKmb > > puxu1huGSa5A5RUYvBKNt2RimeBgObxnp/l7gM=; b=N9yyj5qhfT8TXAwfhcRlY > > mjX4dgzti8JvVtAoO2k69n0r6alQMYT2HiOlNtjTL2XXTiJqreBx4LW07HvP5qIK > > GRbHPusNhK0s2edW9nRzffFZELJ+wfKwOpB/WLNHQXZqlAKyGP3w5civwG+rprB0 > > vaXbO9dYxInWKc80ZIU5Hc= > >Received: from cqlangyi$163.com ( [222.129.238.198] ) by > > ajax-webmail-wmsvr59 (Coremail) ; Fri, 7 Mar 2014 16:11:45 +0800 (CST) > >X-Originating-IP: [222.129.238.198] > >Date: Fri, 7 Mar 2014 16:11:45 +0800 (CST) > >From: cqlangyi > >To: > > solr-user-sc.1394177943.kmfejmmdgfggfaeokajb-cqlangyi= > 163@lucene.apache.org > >Subject: Re:confirm subscribe to solr-user@lucene.apache.org > >X-Priority: 3 > >X-Mailer: Coremail Webmail Server Version SP_ntes V3.5 build > > 20131204(24406.5820.5783) Copyright (c) 2002-2014 www.mailtech.cn 163com > >In-Reply-To: <1394177943.74586.ez...@lucene.apache.org> > >References: <1394177943.74586.ez...@lucene.apache.org> > >X-CM-CTRLDATA: 2T34YmZvb3Rlcl9odG09OTE2NDo4MQ== > >Content-Type: multipart/alternative; > > boundary="=_Part_174263_595565442.1394179905833" > >MIME-Version: 1.0 > >Message-ID: <77b43682.ba9b.1449b991929.coremail.cqlan...@163.com> > >X-CM-TRANSID:O8GowADX389DfxlTrCkLAA--.29605W >
Re: organize folder inside Solr
Thanks, I followed it carefully, the example in the tutorial is indexing only Xml files, and that is my problem, I want my search engine to look for other formats like pictures, music, PDF, and so on. and I'm working just on " Collection1 ", Med. -- View this message in context: http://lucene.472066.n3.nabble.com/organize-folder-inside-Solr-tp4122207p4122242.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: organize folder inside Solr
Hello! There are multiple tutorials on how to do this, for example: 1. http://solr.pl/en/2011/03/21/solr-and-tika-integration-part-1-basics/ 2. http://wiki.apache.org/solr/ExtractingRequestHandler -- Regards, Rafał Kuć Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ > Thanks, I followed it carefully, > the example in the tutorial is indexing only Xml files, and that is my > problem, > I want my search engine to look for other formats like pictures, music, PDF, > and so on. > and I'm working just on " Collection1 ", > Med. > -- > View this message in context: > http://lucene.472066.n3.nabble.com/organize-folder-inside-Solr-tp4122207p4122242.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Caching requests to Solr
following up on this, I've created https://issues.apache.org/jira/browse/SOLR-5826 , with a draft patch. Regards, Tommaso 2014-03-05 8:50 GMT+01:00 Tommaso Teofili : > Hi all, > > I have the following requirement where I have an application talking to > Solr via SolrJ where I don't know upfront which type of Solr instance that > will be communicating with, while this is easily solvable by using > different SolrServer implementations I also need a way to ensure that all > the indexing requests will go through in the correct order even if the Solr > instance(s) will be down for a while. This means that if the Solr instance > / cluster is down I need to cache the requests e.g. in an ordered queue and > let them be processed out of the queue as soon as the instance / cluster > comes up again. > For this I was thinking to implementing a wrapping SolrServer which takes > the "root" SolrServer as a parameter and delegates all the requests to it > while it keeps a queue where all the (indexing) requests start going as > soon as one is failing due to a IO / Connection issue and that gets > continuously processed in order to pull requests out as soon as it's > possible to communicate again with the Solr instance / cluster. > I wonder then if there's any other approach you can think of to handle > this maybe leveraging existing stuff. > > Regards, > Tommaso >
Re: solr IDF based filtering response
request your help on the same. I am sure there should be some way to do it, there should be some way to limit the results based on relevance. Please help -- View this message in context: http://lucene.472066.n3.nabble.com/solr-IDF-based-filtering-response-tp4121271p4122268.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to apply Semantic Search in Solr
Basically, when i searched it on Google I got this result : http://www.opensourceconnections.com/2013/08/25/semantic-search-with-solr-and-python-numpy/ And I am working on this. So is this useful ? On Sat, Mar 8, 2014 at 3:11 PM, Alexandre Rafalovitch wrote: > And how would it know to give you those results? Obviously, you have > some sort of magic/algorithm in your mind. Are you doing geographic > location match, category match, synonyms match? > > We can't really help with generic questions. You still need to figure > out what "semantic" means for you specifically. > > Regards, >Alex. > Personal website: http://www.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all > at once. Lately, it doesn't seem to be working. (Anonymous - via GTD > book) > > > On Sat, Mar 8, 2014 at 4:27 PM, Sohan Kalsariya > wrote: > > Hello, > > > > I am working on an event listing and promotions website( > > http://allevents.in) and I want to apply semantic search on solr. > > For example, if someone search : > > > > "Musical Events in New York" > > So it would give me results such as : > > > > * Musical Night at ABC place > > * Concerts Events > > * Classical Music Event > > I mean all results should be Semantic to the Search_Query it should not > > give the results only based on "tf-idf". So can you please make me > > understand how do i proceed to apply Semantic Search in Solr. ( > allevents.in) > > > > -- > > Regards, > > *Sohan Kalsariya* > -- Regards, *Sohan Kalsariya*
Re: organize folder inside Solr
You should consider taking a basic training course in Solr, and/or reading one of the available introductory books. Or even reading the introduction in my e-book: http://www.lulu.com/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html -- Jack Krupansky -Original Message- From: blach Sent: Saturday, March 8, 2014 8:44 AM To: solr-user@lucene.apache.org Subject: Re: organize folder inside Solr Well, for now, I just want to put some data (binary, PDF, JPEG, ... any) inside solr, should I put them by hand (copy/past) inside \solr-4.7.0\example\solr\collection1, or there another way to do it. thanks. Med. -- View this message in context: http://lucene.472066.n3.nabble.com/organize-folder-inside-Solr-tp4122207p4122219.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to apply Semantic Search in Solr
Thanks for sharing this link Sohan, its an interesting approach. Since you have effectively defined what you mean by Semantic Search, there are couple other approaches I know of to do something like this: 1) preprocess your documents looking for terms that co-occur in the same document. The more such cooccurrences you find the more strongly these terms are related (can help with ordering related terms from most related to least related). At query time expand the query to include /most/ related concepts and search. 2) use an external knowledgebase such as a taxonomy that indicates relationships between concepts (this is the approach we use). At query time expand the query to include related concepts and search. -sujit On Sat, Mar 8, 2014 at 8:21 AM, Sohan Kalsariya wrote: > Basically, when i searched it on Google I got this result : > > > http://www.opensourceconnections.com/2013/08/25/semantic-search-with-solr-and-python-numpy/ > > And I am working on this. > > So is this useful ? > > > On Sat, Mar 8, 2014 at 3:11 PM, Alexandre Rafalovitch >wrote: > > > And how would it know to give you those results? Obviously, you have > > some sort of magic/algorithm in your mind. Are you doing geographic > > location match, category match, synonyms match? > > > > We can't really help with generic questions. You still need to figure > > out what "semantic" means for you specifically. > > > > Regards, > >Alex. > > Personal website: http://www.outerthoughts.com/ > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > > - Time is the quality of nature that keeps events from happening all > > at once. Lately, it doesn't seem to be working. (Anonymous - via GTD > > book) > > > > > > On Sat, Mar 8, 2014 at 4:27 PM, Sohan Kalsariya > > wrote: > > > Hello, > > > > > > I am working on an event listing and promotions website( > > > http://allevents.in) and I want to apply semantic search on solr. > > > For example, if someone search : > > > > > > "Musical Events in New York" > > > So it would give me results such as : > > > > > > * Musical Night at ABC place > > > * Concerts Events > > > * Classical Music Event > > > I mean all results should be Semantic to the Search_Query it should not > > > give the results only based on "tf-idf". So can you please make me > > > understand how do i proceed to apply Semantic Search in Solr. ( > > allevents.in) > > > > > > -- > > > Regards, > > > *Sohan Kalsariya* > > > > > > -- > Regards, > *Sohan Kalsariya* >
Re: Indexing huge data
Thanks for all responses so far. Test runs so far does not suggest any bottleneck with Solr yet as I continue to work on different approaches. Collecting the data from different sources seems to be consuming most of the time. On 3/7/14, 5:53 PM, Erick Erickson wrote: Kranti and Susheel's appoaches are certainly reasonable assuming I bet right :). Another strategy is to rack together N indexing programs that simultaneously feed Solr. In any of these scenarios, the end goal is to get Solr using up all the CPU cycles it can, _assuming_ that Solr isn't the bottleneck in the first place. Best, Erick On Thu, Mar 6, 2014 at 6:38 PM, Kranti Parisa wrote: thats what I do. precreate JSONs following the schema, saving that in MongoDB, this is part of the ETL process. after that, just dump the JSONs into Solr using batching etc. with this you can do full and incremental indexing as well. Thanks, Kranti K. Parisa http://www.linkedin.com/in/krantiparisa On Thu, Mar 6, 2014 at 9:57 AM, Rallavagu wrote: Yeah. I have thought about spitting out JSON and run it against Solr using parallel Http threads separately. Thanks. On 3/5/14, 6:46 PM, Susheel Kumar wrote: One more suggestion is to collect/prepare the data in CSV format (1-2 million sample depending on size) and then import data direct into Solr using CSV handler & curl. This will give you the pure indexing time & the differences. Thanks, Susheel -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, March 05, 2014 8:03 PM To: solr-user@lucene.apache.org Subject: Re: Indexing huge data Here's the easiest thing to try to figure out where to concentrate your energies. Just comment out the server.add call in your SolrJ program. Well, and any commits you're doing from SolrJ. My bet: Your program will run at about the same speed it does when you actually index the docs, indicating that your problem is in the data acquisition side. Of course the older I get, the more times I've been wrong :). You can also monitor the CPU usage on the box running Solr. I often see it idling along < 30% when indexing, or even < 10%, again indicating that the bottleneck is on the acquisition side. Note I haven't mentioned any solutions, I'm a believer in identifying the _problem_ before worrying about a solution. Best, Erick On Wed, Mar 5, 2014 at 4:29 PM, Jack Krupansky wrote: Make sure you're not doing a commit on each individual document add. Commit every few minutes or every few hundred or few thousand documents is sufficient. You can set up auto commit in solrconfig.xml. -- Jack Krupansky -Original Message- From: Rallavagu Sent: Wednesday, March 5, 2014 2:37 PM To: solr-user@lucene.apache.org Subject: Indexing huge data All, Wondering about best practices/common practices to index/re-index huge amount of data in Solr. The data is about 6 million entries in the db and other source (data is not located in one resource). Trying with solrj based solution to collect data from difference resources to index into Solr. It takes hours to index Solr. Thanks in advance
volatile write to make isCleaning visible at ConcurrentLRUCache
Hi; ConcurrentLRUCache class has that lines: ... long oldestEntry = this.oldestEntry; isCleaning = true; this.oldestEntry = oldestEntry; // volatile write to make isCleaning visible ... What does that assignment and so makes isCleaning visible? Thanks; Furkan KAMACI
Re: volatile write to make isCleaning visible at ConcurrentLRUCache
On Sat, Mar 8, 2014 at 2:33 PM, Furkan KAMACI wrote: > ConcurrentLRUCache class has that lines: > > ... > long oldestEntry = this.oldestEntry; > isCleaning = true; > this.oldestEntry = oldestEntry; // volatile write to make isCleaning > visible > ... > > What does that assignment and so makes isCleaning visible? It's called piggy-backing... All changes before a volatile write will be visible to another thread after reading that volatile variable. -Yonik http://heliosearch.org - native off-heap filters and fieldcache for solr
Re: volatile write to make isCleaning visible at ConcurrentLRUCache
On 3/8/2014 12:50 PM, Yonik Seeley wrote: > On Sat, Mar 8, 2014 at 2:33 PM, Furkan KAMACI wrote: >> ConcurrentLRUCache class has that lines: >> >> ... >> long oldestEntry = this.oldestEntry; >> isCleaning = true; >> this.oldestEntry = oldestEntry; // volatile write to make isCleaning >> visible >> ... >> >> What does that assignment and so makes isCleaning visible? > > It's called piggy-backing... > All changes before a volatile write will be visible to another thread > after reading that volatile variable. Is there any kind of testing we can put in Lucene or Solr that can detect if a future version of Java changes in a way that breaks this? Do we have any idea whether this side effect of volatile access is part of the Java specification or simply an exploitable side effect of current implementations? If it's the latter, perhaps we need to locate and comment uses like this in a way that can be easily found later. Thanks, Shawn
Re: volatile write to make isCleaning visible at ConcurrentLRUCache
On Sat, Mar 8, 2014 at 3:28 PM, Shawn Heisey wrote: > Do we have any idea whether this side effect of volatile access is part > of the Java specification Yep, it's part of the JMM (Java Memory Model) and is guaranteed behavior. -Yonik http://heliosearch.org - native off-heap filters and fieldcache for solr