date:20140308

How to apply Semantic Search in Solr

2014-03-08 Thread Sohan Kalsariya

Hello,

I am working on an event listing and promotions website(
http://allevents.in) and I want to apply semantic search on solr.
For example, if someone search :

"Musical Events in New York"
So it would give me results such as :

 * Musical Night at ABC place
 * Concerts Events
 * Classical Music Event
I mean all results should be Semantic to the Search_Query it should not
give the results only based on "tf-idf". So can you please make me
understand how do i proceed to apply Semantic Search in Solr. ( allevents.in)

-- 
Regards,
*Sohan Kalsariya*

Re: How to apply Semantic Search in Solr

2014-03-08 Thread Alexandre Rafalovitch

And how would it know to give you those results? Obviously, you have
some sort of magic/algorithm in your mind. Are you doing geographic
location match, category match, synonyms match?

We can't really help with generic questions. You still need to figure
out what "semantic" means for you specifically.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Sat, Mar 8, 2014 at 4:27 PM, Sohan Kalsariya
 wrote:
> Hello,
>
> I am working on an event listing and promotions website(
> http://allevents.in) and I want to apply semantic search on solr.
> For example, if someone search :
>
> "Musical Events in New York"
> So it would give me results such as :
>
>  * Musical Night at ABC place
>  * Concerts Events
>  * Classical Music Event
> I mean all results should be Semantic to the Search_Query it should not
> give the results only based on "tf-idf". So can you please make me
> understand how do i proceed to apply Semantic Search in Solr. ( allevents.in)
>
> --
> Regards,
> *Sohan Kalsariya*

organize folder inside Solr

2014-03-08 Thread blach

Hello, 
I'm beginner in Apache Solr,
My task is to organize folders inside the Solr
I've read a bit about collections, cores, and all that, what I don't
understand is why every document inside the collection is in XML or Json?
how can I put my folder inside Solr, should I create another collection, and
put my converted data (to xml) into it?
please guide me I'm lost.

Best regards.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/organize-folder-inside-Solr-tp4122207.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: organize folder inside Solr

2014-03-08 Thread Erick Erickson

Well, a couple of things:

1> Solr does NOT index documents in XML, that
 is just the input format. Well, one of the input
 formats. Internally there's a complex inverted
 index storage format.
2> What do you mean "organize into folders"? The
 common way is just to put them all into a single
 core and also index a field with the path to the file.
 You can then do things like "show all files in
 folder X" by adding an fq=filepath:"path/to/folder/x"
 to your query.

Also look at PathTokenizerHierarchyFactory for
interesting ways to get partial paths, in the above
you'd use fq=filepath:"path/to" to get everything in the
tree below "path/to"..

But this really sounds like an XY problem. You've asked
for information about cores without clearly stating what
problem you're trying to solve. How are people intending
to _use_ the search app you're going to build?

Best,
Erick

On Sat, Mar 8, 2014 at 7:01 AM, blach  wrote:
> Hello,
> I'm beginner in Apache Solr,
> My task is to organize folders inside the Solr
> I've read a bit about collections, cores, and all that, what I don't
> understand is why every document inside the collection is in XML or Json?
> how can I put my folder inside Solr, should I create another collection, and
> put my converted data (to xml) into it?
> please guide me I'm lost.
>
> Best regards.
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/organize-folder-inside-Solr-tp4122207.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to apply Semantic Search in Solr

2014-03-08 Thread Jack Krupansky

You will need to implement a sematic/text classification/categorization 
algorithm and annotate each document with additional fields for the 
categories before presenting it to Solr.


For a couple of examples, see:
http://www.slideshare.net/lucenerevolution/text-classification-with-lucenesolr-apache-hadoop-and-libsvm
http://www.slideshare.net/teofili/text-categorization-with-lucene-and-solr

Or google for "text classification" or "text categorization" and "solr".

-- Jack Krupansky

-Original Message- 
From: Sohan Kalsariya

Sent: Saturday, March 8, 2014 4:27 AM
To: solr-user@lucene.apache.org
Subject: How to apply Semantic Search in Solr

Hello,

I am working on an event listing and promotions website(
http://allevents.in) and I want to apply semantic search on solr.
For example, if someone search :

"Musical Events in New York"
So it would give me results such as :

* Musical Night at ABC place
* Concerts Events
* Classical Music Event
I mean all results should be Semantic to the Search_Query it should not
give the results only based on "tf-idf". So can you please make me
understand how do i proceed to apply Semantic Search in Solr. ( 
allevents.in)


--
Regards,
*Sohan Kalsariya*

Re: Solrj Backward Compatibility After 4.5.1

2014-03-08 Thread Furkan KAMACI

I've added a comment at that issue.

Thanks;
Furkan KAMACI


2014-03-07 21:30 GMT+02:00 Shawn Heisey :

> On 3/7/2014 11:58 AM, Furkan KAMACI wrote:
> > Hi;
> >
> > I have  a cluster as SolrCloud of 4.5.1 When I use a Solrj version
> greater
> > than 4.5.1 I get an error when deleting a document via CloudSolrServer of
> > Solrj. When I change the version to 4.5.1 as it works as usual.
> >
> > I know that I should use same versions to avoid compatibility issues.
> > However I think that there should/may be backward compatibility between
> > Solr 4.x versions. If not it would be nice to write down compatibility
> list
> > somewhere else.
> >
> > My question is that: What has changed after 4.5.1 so I can not delete a
> > document with higher versions of Solrj when using CloudSolrServer?
>
> There was a bug for backwards compatibility problems in recent SolrJ.
>
> https://issues.apache.org/jira/browse/SOLR-5762
>
> That bug is resolved as of 4.7, but recently it has come to our
> attention that the problem is not entirely fixed, and appears to be a
> bigger issue than we thought.
>
> I don't recall seeing a new bug filed yet.  I don't fully understand the
> latest problems, or I would have already filed a new bug.
>
> Thanks,
> Shawn
>
>

Re: organize folder inside Solr

2014-03-08 Thread blach

Well, for now, I just want to put some data (binary, PDF, JPEG, ... any)
inside solr, 
should I put them by hand (copy/past) inside
\solr-4.7.0\example\solr\collection1,
or there another way to do it.

thanks.
Med.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/organize-folder-inside-Solr-tp4122207p4122219.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: organize folder inside Solr

2014-03-08 Thread Erick Erickson

You really have to back up and do your
homework here. Please work through
the tutorial below, it'll help clear up some
of your confusion. For instance, you feed
documents _to_ solr, after you've defined
a schema, figured out your use-cases, etc.
You don't just stick documents somewhere,
turn Solr on and magically be able to search
them.

https://lucene.apache.org/solr/4_7_0/tutorial.html

Best,
Erick

On Sat, Mar 8, 2014 at 8:44 AM, blach  wrote:
> Well, for now, I just want to put some data (binary, PDF, JPEG, ... any)
> inside solr,
> should I put them by hand (copy/past) inside
> \solr-4.7.0\example\solr\collection1,
> or there another way to do it.
>
> thanks.
> Med.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/organize-folder-inside-Solr-tp4122207p4122219.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: What is mean by Index Searcher?

2014-03-08 Thread Furkan KAMACI

Hi;

At this point I suggest you to read here:
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Thanks;
Furkan KAMACI


2014-03-07 10:44 GMT+02:00 Alexandre Rafalovitch :

> Some events close and reopen the searcher. Commit is the main one
> during lifetime of Solr server. So, you can read this "until commit".
> Of course, you have soft and hard commits with settings to reopen or
> not reopen the searcher, so you may want to read up on that if you are
> trying to understand this all completely.
>
> Regards,
>Alex.
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Fri, Mar 7, 2014 at 3:25 PM, search engn dev
>  wrote:
> > Thanks Alex,
> >
> > But what is mean by "...lifetime of that searcher." Is is lifetime of any
> > particular query or what.?
> >
> > Sorry but i am not able to understand this. :(
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/What-is-mean-by-Index-Searcher-tp4121898p4121912.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: SolrCloud with Tomcat

2014-03-08 Thread Furkan KAMACI

Hi;

Could you check here:
http://lucene.472066.n3.nabble.com/Error-when-creating-collection-in-Solr-4-6-td4103536.html

Thanks;
Furkan KAMACI


2014-03-07 9:44 GMT+02:00 Vineet Mishra :

> Hi
>
> I am installing SolrCloud with 3 External
> Zookeeper(localhost:2181,localhost:2182,localhost:2183) and 2
> Tomcats(localhost:8181,localhost:8182) all available on a single
> Machine(Just for getting started).
> By Following these links
>
> http://myjeeva.com/solrcloud-cluster-single-collection-deployment.html
> http://wiki.apache.org/solr/SolrCloudTomcat
>
> I have got the Solr UI on the machine pointing to
>
> http://localhost:8181/solr/#/~cloud
>
> In the Cloud Graph View it is coming with
>
> mycollection
> |
> |_ shard1
> |_ shard2
>
> But both the shards are empty and showing no cores or replica.
>
> Following
> http://myjeeva.com/solrcloud-cluster-single-collection-deployment.htmlblog
> ,
> I have been successful till starting tomcat,
> since after the section "Creating Collection, Shard(s), Replica(s) in
> SolrCloud" I am facing the problem.
>
> Giving command to create replica for the shard using
>
> *curl
> '
> http://localhost:8181/solr/admin/cores?action=CREATE&name=shard1-replica-2&collection=mycollection&shard=shard1
> <
> http://localhost:8181/solr/admin/cores?action=CREATE&name=shard1-replica-2&collection=mycollection&shard=shard1
> >'*
>
> it is giving error
>
> 
> 400 name="QTime">137
> *Error CREATEing SolrCore 'shard1-replica-2':
> 192.168.2.183:8182_solr_shard1-replica-2 is removed*
> 400
> 
>
> Has anybody went through this issue?
>
> Regards
>

Re: Partial Counts in SOLR

2014-03-08 Thread Salman Akram

The issue with timeallowed is you never know if it will return minimum
amount of docs or not.

I do want docs to be sorted based on date but it seems its not possible
that solr starts searching from recent docs and stops after finding certain
no. of docs...any other tweak?

Thanks


On Saturday, March 8, 2014, Chris Hostetter 
wrote:

>
> : Reason: In an index with millions of documents I don't want to know that
> a
> : certain query matched 1 million docs (of course it will take time to
> : calculate that). Why don't just stop looking for more results lets say
> : after it finds 100 docs? Possible??
>
> but if you care about sorting, ie: you want the top 100 documents sorted
> by score, or sorted by date, you still have to "collect" all 1 million
> matches in order to know what the first 100 are.
>
> if you really don't care about sorting, you can use the "timAllowed"
> option to tell the seraching method to do the best job it can in an
> (approximated) limited amount of time, and then pretend that the docs
> collected so far represent the total number of matches...
>
>
> https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThetimeAllowedParameter
>
>
> -Hoss
> http://www.lucidworks.com/
>


-- 
Regards,

Salman Akram
Project Manager - Intelligize
NorthBay Solutions
410-G4 Johar Town, Lahore
Off: +92-42-35290152

Cell: +92-302-8495621

Re: howto count total word amount of all documents in solr index?

2014-03-08 Thread Furkan KAMACI

Hi;

Dou you want that:
http://localhost:8983/solr/#/collection1/schema-browser?field=text_general

Thanks;
Furkan KAMACI


2014-03-07 10:48 GMT+02:00 cqlangyi :

> hi there,
>
>
> i have following questions, please help me out, very appreciate.
>
> say i have a field configured as "text_general" type, and indexed 3 pieces
> content as documents.
> 1. "today is a good day"
> 2. "call your family every day"
> 3. "come with me"
>
>
> how could i count the total (even roughly) word amount in these 3
> documents, with the above the
> result should be "13" at max or something a little less if the stopwords
> enabled.
>
>
> thanks a lot.
>
>
> Cq
>
>
>
>
>
>
> At 2014-03-07 16:12:17,solr-user-h...@lucene.apache.org wrote:
> >Hi! This is the ezmlm program. I'm managing the
> >solr-user@lucene.apache.org mailing list.
> >
> >I'm working for my owner, who can be reached
> >at solr-user-ow...@lucene.apache.org.
> >
> >Acknowledgment: I have added the address
> >
> >   cqlan...@163.com
> >
> >to the solr-user mailing list.
> >
> >Welcome to solr-user@lucene.apache.org!
> >
> >Please save this message so that you know the address you are
> >subscribed under, in case you later want to unsubscribe or change your
> >subscription address.
> >
> >
> >--- Administrative commands for the solr-user list ---
> >
> >I can handle administrative requests automatically. Please
> >do not send them to the list address! Instead, send
> >your message to the correct command address:
> >
> >To subscribe to the list, send a message to:
> >   
> >
> >To remove your address from the list, send a message to:
> >   
> >
> >Send mail to the following for info and FAQ for this list:
> >   
> >   
> >
> >Similar addresses exist for the digest list:
> >   
> >   
> >
> >To get messages 123 through 145 (a maximum of 100 per request), mail:
> >   
> >
> >To get an index with subject and author for messages 123-456 , mail:
> >   
> >
> >They are always returned as sets of 100, max 2000 per request,
> >so you'll actually get 100-499.
> >
> >To receive all messages with the same subject as message 12345,
> >send a short message to:
> >   
> >
> >The messages should contain one line or word of text to avoid being
> >treated as sp@m, but I will ignore their content.
> >Only the ADDRESS you send to is important.
> >
> >You can start a subscription for an alternate address,
> >for example "john@host.domain", just add a hyphen and your
> >address (with '=' instead of '@') after the command word:
> >
> >
> >To stop subscription for this address, mail:
> >
> >
> >In both cases, I'll send a confirmation message to that address. When
> >you receive it, simply reply to it to complete your subscription.
> >
> >If despite following these instructions, you do not get the
> >desired results, please contact my owner at
> >solr-user-ow...@lucene.apache.org. Please be patient, my owner is a
> >lot slower than I am ;-)
> >
> >--- Enclosed is a copy of the request I received.
> >
> >Return-Path: 
> >Received: (qmail 15386 invoked by uid 99); 7 Mar 2014 08:12:16 -
> >Received: from athena.apache.org (HELO athena.apache.org)
> (140.211.11.136)
> >by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Mar 2014 08:12:16
> +
> >X-ASF-Spam-Status: No, hits=4.9 required=5.0
> >   tests=HTML_MESSAGE,RCVD_IN_PSBL,SPF_PASS
> >X-Spam-Check-By: apache.org
> >Received-SPF: pass (athena.apache.org: domain of cqlangyi@163.comdesignates 
> >220.181.13.59 as permitted sender)
> >Received: from [220.181.13.59] (HELO m13-59.163.com) (220.181.13.59)
> >by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Mar 2014 08:12:10
> +
> >DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com;
> >   s=s110527; h=Date:From:Subject:MIME-Version:Message-ID; bh=KIKmb
> >   puxu1huGSa5A5RUYvBKNt2RimeBgObxnp/l7gM=; b=N9yyj5qhfT8TXAwfhcRlY
> >   mjX4dgzti8JvVtAoO2k69n0r6alQMYT2HiOlNtjTL2XXTiJqreBx4LW07HvP5qIK
> >   GRbHPusNhK0s2edW9nRzffFZELJ+wfKwOpB/WLNHQXZqlAKyGP3w5civwG+rprB0
> >   vaXbO9dYxInWKc80ZIU5Hc=
> >Received: from cqlangyi$163.com ( [222.129.238.198] ) by
> > ajax-webmail-wmsvr59 (Coremail) ; Fri, 7 Mar 2014 16:11:45 +0800 (CST)
> >X-Originating-IP: [222.129.238.198]
> >Date: Fri, 7 Mar 2014 16:11:45 +0800 (CST)
> >From: cqlangyi  
> >To:
> >   solr-user-sc.1394177943.kmfejmmdgfggfaeokajb-cqlangyi=
> 163@lucene.apache.org
> >Subject: Re:confirm subscribe to solr-user@lucene.apache.org
> >X-Priority: 3
> >X-Mailer: Coremail Webmail Server Version SP_ntes V3.5 build
> > 20131204(24406.5820.5783) Copyright (c) 2002-2014 www.mailtech.cn 163com
> >In-Reply-To: <1394177943.74586.ez...@lucene.apache.org>
> >References: <1394177943.74586.ez...@lucene.apache.org>
> >X-CM-CTRLDATA: 2T34YmZvb3Rlcl9odG09OTE2NDo4MQ==
> >Content-Type: multipart/alternative;
> >   boundary="=_Part_174263_595565442.1394179905833"
> >MIME-Version: 1.0
> >Message-ID: <77b43682.ba9b.1449b991929.coremail.cqlan...@163.com>
> >X-CM-TRANSID:O8GowADX389DfxlTrCkLAA--.29605W
>

Re: organize folder inside Solr

2014-03-08 Thread blach

Thanks, I followed it carefully, 
the example in the tutorial is indexing only Xml files, and that is my
problem,
I want my search engine to look for other formats like pictures, music, PDF,
and so on.
and I'm working just on " Collection1 ",

Med. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/organize-folder-inside-Solr-tp4122207p4122242.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: organize folder inside Solr

2014-03-08 Thread Rafał Kuć

Hello!

There are multiple tutorials on how to do this, for example:

1. http://solr.pl/en/2011/03/21/solr-and-tika-integration-part-1-basics/
2. http://wiki.apache.org/solr/ExtractingRequestHandler

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


> Thanks, I followed it carefully, 
> the example in the tutorial is indexing only Xml files, and that is my
> problem,
> I want my search engine to look for other formats like pictures, music, PDF,
> and so on.
> and I'm working just on " Collection1 ",

> Med. 



> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/organize-folder-inside-Solr-tp4122207p4122242.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Caching requests to Solr

2014-03-08 Thread Tommaso Teofili

following up on this, I've created
https://issues.apache.org/jira/browse/SOLR-5826 , with a draft patch.
Regards,
Tommaso


2014-03-05 8:50 GMT+01:00 Tommaso Teofili :

> Hi all,
>
> I have the following requirement where I have an application talking to
> Solr via SolrJ where I don't know upfront which type of Solr instance that
> will be communicating with, while this is easily solvable by using
> different SolrServer implementations I also need a way to ensure that all
> the indexing requests will go through in the correct order even if the Solr
> instance(s) will be down for a while. This means that if the Solr instance
> / cluster is down I need to cache the requests e.g. in an ordered queue and
> let them be processed out of the queue as soon as the instance / cluster
> comes up again.
> For this I was thinking to implementing a wrapping SolrServer which takes
> the "root" SolrServer as a parameter and delegates all the requests to it
> while it keeps a queue where all the (indexing) requests start going as
> soon as one is failing due to a IO / Connection issue and that gets
> continuously processed in order to pull requests out as soon as it's
> possible to communicate again with the Solr instance / cluster.
> I wonder then if there's any other approach you can think of to handle
> this maybe leveraging existing stuff.
>
> Regards,
> Tommaso
>

Re: solr IDF based filtering response

2014-03-08 Thread GaneshSe

request your help on the same. I am sure there should be some way to do it,
there should be some way to limit the results based on relevance. Please
help



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-IDF-based-filtering-response-tp4121271p4122268.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to apply Semantic Search in Solr

2014-03-08 Thread Sohan Kalsariya

Basically, when i searched it on Google I got this result :

http://www.opensourceconnections.com/2013/08/25/semantic-search-with-solr-and-python-numpy/

And I am working on this. 

So is this useful ?


On Sat, Mar 8, 2014 at 3:11 PM, Alexandre Rafalovitch wrote:

> And how would it know to give you those results? Obviously, you have
> some sort of magic/algorithm in your mind. Are you doing geographic
> location match, category match, synonyms match?
>
> We can't really help with generic questions. You still need to figure
> out what "semantic" means for you specifically.
>
> Regards,
>Alex.
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Sat, Mar 8, 2014 at 4:27 PM, Sohan Kalsariya
>  wrote:
> > Hello,
> >
> > I am working on an event listing and promotions website(
> > http://allevents.in) and I want to apply semantic search on solr.
> > For example, if someone search :
> >
> > "Musical Events in New York"
> > So it would give me results such as :
> >
> >  * Musical Night at ABC place
> >  * Concerts Events
> >  * Classical Music Event
> > I mean all results should be Semantic to the Search_Query it should not
> > give the results only based on "tf-idf". So can you please make me
> > understand how do i proceed to apply Semantic Search in Solr. (
> allevents.in)
> >
> > --
> > Regards,
> > *Sohan Kalsariya*
>



-- 
Regards,
*Sohan Kalsariya*

Re: organize folder inside Solr

2014-03-08 Thread Jack Krupansky

You should consider taking a basic training course in Solr, and/or reading 
one of the available introductory books. Or even reading the introduction in 
my e-book:


http://www.lulu.com/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html

-- Jack Krupansky

-Original Message- 
From: blach

Sent: Saturday, March 8, 2014 8:44 AM
To: solr-user@lucene.apache.org
Subject: Re: organize folder inside Solr

Well, for now, I just want to put some data (binary, PDF, JPEG, ... any)
inside solr,
should I put them by hand (copy/past) inside
\solr-4.7.0\example\solr\collection1,
or there another way to do it.

thanks.
Med.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/organize-folder-inside-Solr-tp4122207p4122219.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to apply Semantic Search in Solr

2014-03-08 Thread Sujit Pal

Thanks for sharing this link Sohan, its an interesting approach. Since you
have effectively defined what you mean by Semantic Search, there are couple
other approaches I know of to do something like this:
1) preprocess your documents looking for terms that co-occur in the same
document. The more such cooccurrences you find the more strongly these
terms are related (can help with ordering related terms from most related
to least related). At query time expand the query to include /most/ related
concepts and search.
2) use an external knowledgebase such as a taxonomy that indicates
relationships between concepts (this is the approach we use). At query time
expand the query to include related concepts and search.

-sujit

On Sat, Mar 8, 2014 at 8:21 AM, Sohan Kalsariya wrote:

> Basically, when i searched it on Google I got this result :
>
>
> http://www.opensourceconnections.com/2013/08/25/semantic-search-with-solr-and-python-numpy/
>
> And I am working on this.
>
> So is this useful ?
>
>
> On Sat, Mar 8, 2014 at 3:11 PM, Alexandre Rafalovitch  >wrote:
>
> > And how would it know to give you those results? Obviously, you have
> > some sort of magic/algorithm in your mind. Are you doing geographic
> > location match, category match, synonyms match?
> >
> > We can't really help with generic questions. You still need to figure
> > out what "semantic" means for you specifically.
> >
> > Regards,
> >Alex.
> > Personal website: http://www.outerthoughts.com/
> > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > - Time is the quality of nature that keeps events from happening all
> > at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> > book)
> >
> >
> > On Sat, Mar 8, 2014 at 4:27 PM, Sohan Kalsariya
> >  wrote:
> > > Hello,
> > >
> > > I am working on an event listing and promotions website(
> > > http://allevents.in) and I want to apply semantic search on solr.
> > > For example, if someone search :
> > >
> > > "Musical Events in New York"
> > > So it would give me results such as :
> > >
> > >  * Musical Night at ABC place
> > >  * Concerts Events
> > >  * Classical Music Event
> > > I mean all results should be Semantic to the Search_Query it should not
> > > give the results only based on "tf-idf". So can you please make me
> > > understand how do i proceed to apply Semantic Search in Solr. (
> > allevents.in)
> > >
> > > --
> > > Regards,
> > > *Sohan Kalsariya*
> >
>
>
>
> --
> Regards,
> *Sohan Kalsariya*
>

Re: Indexing huge data

2014-03-08 Thread Rallavagu

Thanks for all responses so far. Test runs so far does not suggest any 
bottleneck with Solr yet as I continue to work on different approaches. 
Collecting the data from different sources seems to be consuming most of 
the time.


On 3/7/14, 5:53 PM, Erick Erickson wrote:

Kranti and Susheel's appoaches are certainly
reasonable assuming I bet right :).

Another strategy is to rack together N
indexing programs that simultaneously
feed Solr.

In any of these scenarios, the end goal is to get
Solr using up all the CPU cycles it can, _assuming_
that Solr isn't the bottleneck in the first place.

Best,
Erick

On Thu, Mar 6, 2014 at 6:38 PM, Kranti Parisa  wrote:

thats what I do. precreate JSONs following the schema, saving that in
MongoDB, this is part of the ETL process. after that, just dump the JSONs
into Solr using batching etc. with this you can do full and incremental
indexing as well.

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Thu, Mar 6, 2014 at 9:57 AM, Rallavagu  wrote:


Yeah. I have thought about spitting out JSON and run it against Solr using
parallel Http threads separately. Thanks.


On 3/5/14, 6:46 PM, Susheel Kumar wrote:


One more suggestion is to collect/prepare the data in CSV format (1-2
million sample depending on size) and then import data direct into Solr
using CSV handler & curl.  This will give you the pure indexing time & the
differences.

Thanks,
Susheel

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Wednesday, March 05, 2014 8:03 PM
To: solr-user@lucene.apache.org
Subject: Re: Indexing huge data

Here's the easiest thing to try to figure out where to concentrate your
energies. Just comment out the server.add call in your SolrJ program.
Well, and any commits you're doing from SolrJ.

My bet: Your program will run at about the same speed it does when you
actually index the docs, indicating that your problem is in the data
acquisition side. Of course the older I get, the more times I've been wrong
:).

You can also monitor the CPU usage on the box running Solr. I often see
it idling along < 30% when indexing, or even < 10%, again indicating that
the bottleneck is on the acquisition side.

Note I haven't mentioned any solutions, I'm a believer in identifying the
_problem_ before worrying about a solution.

Best,
Erick

On Wed, Mar 5, 2014 at 4:29 PM, Jack Krupansky 
wrote:


Make sure you're not doing a commit on each individual document add.
Commit every few minutes or every few hundred or few thousand
documents is sufficient. You can set up auto commit in solrconfig.xml.

-- Jack Krupansky

-Original Message- From: Rallavagu
Sent: Wednesday, March 5, 2014 2:37 PM
To: solr-user@lucene.apache.org
Subject: Indexing huge data


All,

Wondering about best practices/common practices to index/re-index huge
amount of data in Solr. The data is about 6 million entries in the db
and other source (data is not located in one resource). Trying with
solrj based solution to collect data from difference resources to
index into Solr. It takes hours to index Solr.

Thanks in advance

volatile write to make isCleaning visible at ConcurrentLRUCache

2014-03-08 Thread Furkan KAMACI

Hi;

ConcurrentLRUCache  class has that lines:

...
long oldestEntry = this.oldestEntry;
isCleaning = true;
this.oldestEntry = oldestEntry; // volatile write to make isCleaning
visible
...

What does that assignment and so makes isCleaning visible?

Thanks;
Furkan KAMACI

Re: volatile write to make isCleaning visible at ConcurrentLRUCache

2014-03-08 Thread Yonik Seeley

On Sat, Mar 8, 2014 at 2:33 PM, Furkan KAMACI  wrote:
> ConcurrentLRUCache  class has that lines:
>
> ...
> long oldestEntry = this.oldestEntry;
> isCleaning = true;
> this.oldestEntry = oldestEntry; // volatile write to make isCleaning
> visible
> ...
>
> What does that assignment and so makes isCleaning visible?

It's called piggy-backing...
All changes before a volatile write will be visible to another thread
after reading that volatile variable.

-Yonik
http://heliosearch.org - native off-heap filters and fieldcache for solr

Re: volatile write to make isCleaning visible at ConcurrentLRUCache

2014-03-08 Thread Shawn Heisey

On 3/8/2014 12:50 PM, Yonik Seeley wrote:
> On Sat, Mar 8, 2014 at 2:33 PM, Furkan KAMACI  wrote:
>> ConcurrentLRUCache  class has that lines:
>>
>> ...
>> long oldestEntry = this.oldestEntry;
>> isCleaning = true;
>> this.oldestEntry = oldestEntry; // volatile write to make isCleaning
>> visible
>> ...
>>
>> What does that assignment and so makes isCleaning visible?
> 
> It's called piggy-backing...
> All changes before a volatile write will be visible to another thread
> after reading that volatile variable.

Is there any kind of testing we can put in Lucene or Solr that can
detect if a future version of Java changes in a way that breaks this?

Do we have any idea whether this side effect of volatile access is part
of the Java specification or simply an exploitable side effect of
current implementations?  If it's the latter, perhaps we need to locate
and comment uses like this in a way that can be easily found later.

Thanks,
Shawn

Re: volatile write to make isCleaning visible at ConcurrentLRUCache

2014-03-08 Thread Yonik Seeley

On Sat, Mar 8, 2014 at 3:28 PM, Shawn Heisey  wrote:
> Do we have any idea whether this side effect of volatile access is part
> of the Java specification

Yep, it's part of the JMM (Java Memory Model) and is guaranteed behavior.

-Yonik
http://heliosearch.org - native off-heap filters and fieldcache for solr

How to apply Semantic Search in Solr

Re: How to apply Semantic Search in Solr

organize folder inside Solr

Re: organize folder inside Solr

Re: How to apply Semantic Search in Solr

Re: Solrj Backward Compatibility After 4.5.1

Re: organize folder inside Solr

Re: organize folder inside Solr

Re: What is mean by Index Searcher?

Re: SolrCloud with Tomcat

Re: Partial Counts in SOLR

Re: howto count total word amount of all documents in solr index?

Re: organize folder inside Solr

Re: organize folder inside Solr

Re: Caching requests to Solr

Re: solr IDF based filtering response

Re: How to apply Semantic Search in Solr

Re: organize folder inside Solr

Re: How to apply Semantic Search in Solr

Re: Indexing huge data

volatile write to make isCleaning visible at ConcurrentLRUCache

Re: volatile write to make isCleaning visible at ConcurrentLRUCache

Re: volatile write to make isCleaning visible at ConcurrentLRUCache

Re: volatile write to make isCleaning visible at ConcurrentLRUCache

24 matches

Site Navigation

Mail list logo

Footer information