Re: Why there is no getter method for defaultCollection at CloudSolrServer?

2013-06-12 Thread Furkan KAMACI
Ok, I will create a JIRA for it.

2013/6/11 Mark Miller 

>
> On Jun 11, 2013, at 4:51 AM, Furkan KAMACI  wrote:
>
> > Why there is no getter method for defaultCollection at CloudSolrServer?
>
> Want to create a JIRA issue to add it?
>
> - Mark
>


Re: Not getting results when searching a term from Solr Admin

2013-06-12 Thread Raymond Wiker
It appears that the word "bing" appears in the title; is the title field
copied into the default search field (assuming that you even have a default
search field)? If not, you need to somehow specify the field(s) that you
want to search in.


On Wed, Jun 12, 2013 at 7:52 AM, coderslay wrote:

> Hi Jack,
>
> Thanks for the quick response.
>
> Actually i have configured Apache Nutch 1.6 along with Apache Solr 4.3.0
> After crawling a website(For example www.bing.com) I use the following
> command to do a Solr Index
> bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb
> crawl/linkdb crawl/segments/*
>
> Then when i go back to the solr admin panel and check for *.* then i get
> the
> following result
> 
>
> But when i search for bing then i get 0 response as showed here in the pic
> 
>
> I dont know why i am not getting the result.
> I came across these post having the same issue which i am having now.
>
> http://stackoverflow.com/questions/6950163/solr-index-empty-after-nutch-solrindex-command
> http://stackoverflow.com/questions/10813792/solr-admin-shows-nothing-nutch
>
> Any help will be appreciated.
>
> Regards,
> Nasir
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Not-getting-results-when-searching-a-term-from-Solr-Admin-tp4069761p4069847.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


What is Difference Between Down and Gone At Admin Cloud Page?

2013-06-12 Thread Furkan KAMACI
What is Difference Between Down and Gone At Admin Cloud Page?


Re: Not getting results when searching a term from Solr Admin

2013-06-12 Thread coderslay
Hi jack,

Here is my  schema.xml
  
My default search field id "content"

Regards,
Nasir



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Not-getting-results-when-searching-a-term-from-Solr-Admin-tp4069761p4069862.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Not getting results when searching a term from Solr Admin

2013-06-12 Thread Raymond Wiker
There's your problem, then - you have "content" as the default search
field, but your copyField nodes treat "text" as the default search field.
If you change the default search field to text, you should be able to
search for "bing"; otherwise, you'll need to use something like
"content:bing".


On Wed, Jun 12, 2013 at 9:38 AM, coderslay wrote:

> Hi jack,
>
> Here is my  schema.xml
> 
> My default search field id "content"
>
> Regards,
> Nasir
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Not-getting-results-when-searching-a-term-from-Solr-Admin-tp4069761p4069862.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


How to Use PageRank like Document Boosting at Solr?

2013-06-12 Thread Furkan KAMACI
I use Nutch to index my documents. I have a Nutch aware schema at my Solr
and there is a field like that:



boost holds the epic score of my documents (similar to Google's pagerank).
How can I boost my queries at Solr side?I followed wiki and tried that:

q={!boost b=boost}text:supervillians

and it says:

can not use FieldCache on a field which is neither indexed nor has doc
values: boost

However there should be a convenient solution for my purpose. Instead of
adding something to search query maybe I boost document with a different
way while indexing, what do you suggest for me?


Re: Not getting results when searching a term from Solr Admin

2013-06-12 Thread coderslay
Hi Jack,

I tried doing what you told me still i am facing the same issue :(
Can you provide me some sample schema.xml to work it out?

Regards,
Nasir



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Not-getting-results-when-searching-a-term-from-Solr-Admin-tp4069761p4069869.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Not getting results when searching a term from Solr Admin

2013-06-12 Thread Raymond Wiker
I'm not Jack, but...

... locate the line in schema.xml that says

content

and replace "content" with "text".

You may also have to edit solrconfig.xml if the request handler defines the
parameter "df" - this, too, should point to your default field.


On Wed, Jun 12, 2013 at 10:44 AM, coderslay wrote:

> Hi Jack,
>
> I tried doing what you told me still i am facing the same issue :(
> Can you provide me some sample schema.xml to work it out?
>
> Regards,
> Nasir
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Not-getting-results-when-searching-a-term-from-Solr-Admin-tp4069761p4069869.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: document indexing

2013-06-12 Thread sodoo
Hi all,

I am beginner and i try to index pdf, docx, txt files.
How I can I index these format files?

I have installed solr server in /opt/solr
Also I have created "documents" directory. Then I copied index files in
/opt/solr/documents.

I tried to index below command. Originally almost indexed. I looked the log
file. Doc index log has written. But unfortunately searched text not found.

curl
"http://localhost:8983/solr/update/extract?stream.file=/opt/solr/document/Web_Hosting_Instruction.pdf&literal.id=doc1";

Please advice & assist me. 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/document-indexing-tp3241832p4069871.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Adding pdf/word file using JSON/XML

2013-06-12 Thread Roland Everaert
1) Being aggressive and insulting is not a way to help people understand
such complex tool or to help people in general.

2) I read again the feature page of Solr and it is stated that the
interface is REST-like and not RESTful as I though in the first place, and
communicate to the devs. And as the devs told me a RESTful interface
doesn't use parameters in the URI/URL, so ii is my mistake. Hence we have
no problem with the interface as it is.

Any way I still have a question regarding the /extract interface. It seems
that every time a file is updated in Solr, the lucene document is recreated
from scratch which means that any extra information we want to be
indexed/stored along the file is erased if the request doesn't contains
them. Is there a parameter that allow changing that behaviour?



Regards,


Roland.


On Tue, Jun 11, 2013 at 4:35 PM, Jack Krupansky wrote:

> "is it possible to index the file + metadata with a JSON/XML request?"
>
> You still aren't being clear as to what you are really trying to achieve
> here. I mean, just write a shell script that does the curl command, or
> write a Java program or application layer that uses SolrJ to talk to Solr
> and accepts JSON?XML/REST requests.
>
>
> "It seems that the only way to index a file with some metadata is to build
> a
> request that would look like the following example that uses curl."
>
> Curl is just a fancy way to do an HTTP request. You can do the same HTTP
> request from Java code (or Python or whatever.)
>
>
> "The developer would like to avoid using parameters in the url to pass
> arguments."
>
> Seriously?! What is THAT all about!!  I mean, really, HTTP and URLs and
> URL query parameters are part of the heart of the Internet infrastructure!
>
> If this whole thread is merely that you have an IDIOT who can't cope with
> passing HTTP URL query parameters, all I can say is... Wow!
>
> But use SolrJ and then at least it doesn't LOOK like they are URL Query
> parameters.
>
> Or, maybe this is just a case where the developer WANTS to use SOAP rather
> than a REST style of API.
>
> In any case, please clue us in as to what PROBLEM you are really trying to
> solve. Just use plain English and avoid getting caught up in what the
> solution might be.
>
> The real bottom line is that random application developers should not be
> talking directly to Solr anyway - they should be provided with an
> "application layer" that has a clean, application-oriented REST API and the
> gory details of the Solr API would be hidden inside the application layer.
>
>
> -- Jack Krupansky
>
> -Original Message- From: Roland Everaert
> Sent: Tuesday, June 11, 2013 8:48 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Adding pdf/word file using JSON/XML
>
> We are working on an application that allows some users to add files (pdf,
> ms word, odt, etc), located on their local hard disk, to our internal
> system and allows other users to search for them. So we are considering
> Solr for the indexing and search functionalities of the system. Along with
> the file content, we want to index some metadata related to the file.
>
> It seems obvious that Solr couldn't import the file from the local disk of
> the user, so the system will have to import the file into a directory that
> Solr can reach and instruct Solr to index the file with the metadata, but
> is it possible to index the file + metadata with a JSON/XML request?
>
> It seems that the only way to index a file with some metadata is to build a
> request that would look like the following exemple that uses curl. The
> developer would like to avoid using parameters in the url to pass
> arguments.
>
> curl "
> http://localhost:8080/solr/**update/extract?literal.id=**
> doc10&literal.name=BLAH&**defaultField=text
> "
> --data-binary @/path/to/file.pdf -H "Content-Type: application/pdf"
>
>
> Additionally, it seems that if a subsequent request is sent to the indexer
> to update the file, if the metadata are not passed to Solr with the
> request, they are deleted.
>
> Thanks for your help,
>
>
>
> Roland.
>
>
> On Mon, Jun 10, 2013 at 4:14 PM, Jack Krupansky *
> *wrote:
>
>  Sorry, but you are STILL not being clear!
>>
>> Are you asking if you can pass Solr parameters as XML fields? No.
>>
>> Are you asking if the file name and path can be indexed as metadata? To
>> some degree:
>>
>> curl 
>> "http://localhost:8983/solr/update/extract?literal.id=doc-1\
>> 
>> >
>> &commit=true&uprefix=attr_" -F "HelloWorld.docx=@HelloWorld.docx"
>>
>> Then the stream has a name that is indexed as metadata:
>>
>> 
>>  stream_source_info
>>  HelloWorld.docx
>>  stream_content_type
>>  application/octet-stream
>>
>>  s

Re: Adding pdf/word file using JSON/XML

2013-06-12 Thread Gora Mohanty
On 12 June 2013 14:51, Roland Everaert  wrote:
[...]
> Any way I still have a question regarding the /extract interface. It seems
> that every time a file is updated in Solr, the lucene document is recreated
> from scratch which means that any extra information we want to be
> indexed/stored along the file is erased if the request doesn't contains
> them. Is there a parameter that allow changing that behaviour?
[...]

You really should start a different thread for an
unrelated question.

If I understand the above correctly, what you are
looking for is partial updates. Please see, e.g.,
http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/

Regards,
Gora


Solved: Replication not working

2013-06-12 Thread Thomas.Porocnik

Solved it now.
It was a shameful typo in the config:
I wrote pollInterfall instead of pollInterval :-)

It was never polling I just missunderstood the logs...

Thxs

Thomas


-Original Message-
From: thomas.poroc...@der.net [mailto:thomas.poroc...@der.net] 
Sent: Tuesday, June 11, 2013 2:30 PM
To: solr-user@lucene.apache.org
Subject: RE: Replication not working

This is the log when the slave is polling.
But it's onlythe last two times.
As you can see, the time in between is ~ 2 minutes.

It it's from interest, I can post complete log from fresh restart on.

-Original Message-
From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.p...@gmail.com] 
Sent: Tuesday, June 11, 2013 2:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Replication not working

I mean , the log when polling happens when from slave. Not when you issue a
command.


On Tue, Jun 11, 2013 at 5:28 PM,  wrote:

> Log on slave:
>
> 2013-06-11 13:19:08,477 8385607 INFO  [org.apache.solr.core.SolrCore]
> (http-0.0.0.0-31006-1:) [contacts] webapp=/solr path=/replication
> params={indent=true&command=indexversions&wt=json+} status=0 QTime=0
> 2013-06-11 13:19:08,477 8385607 DEBUG
> [org.apache.solr.servlet.SolrDispatchFilter] (http-0.0.0.0-31006-1:)
> Closing out SolrRequest: {indent=true&command=indexversions&wt=json+}
> 2013-06-11 13:22:27,017 8584147 INFO  [org.apache.solr.core.SolrCore]
> (http-0.0.0.0-31006-1:) [contacts] webapp=/solr path=/replication
> params={command=indexversion} status=0 QTime=0
> 2013-06-11 13:22:27,017 8584147 DEBUG
> [org.apache.solr.servlet.SolrDispatchFilter] (http-0.0.0.0-31006-1:)
> Closing out SolrRequest: {command=indexversion}
>
> -Ursprüngliche Nachricht-
> Von: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.p...@gmail.com]
> Gesendet: Dienstag, 11. Juni 2013 13:41
> An: solr-user@lucene.apache.org
> Betreff: Re: Replication not working
>
> You said polling is happening and nothing is replicated
>
> What do the logs say on slave (Set level to INFO) ?
>
>
>
>
> On Tue, Jun 11, 2013 at 4:54 PM,  wrote:
>
> > Calling indexversion on master gives:
> >  
> >   
> >  00
> >   
> >   1370612995391
> >   53
> >  
> >
> > On Slave:
> >  
> >0
> >0
> >0
> >1
> >  
> >
> > > pollInterval is set to 2 minutes. It is usually long
> >
> > I know ;-)
> >
> >
> > -Ursprüngliche Nachricht-
> > Von: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.p...@gmail.com]
> > Gesendet: Dienstag, 11. Juni 2013 13:16
> > An: solr-user@lucene.apache.org
> > Betreff: Re: Replication not working
> >
> > can you check with the indexversion command on both mater and slave?
> >
> > pollInterval is set to 2 minutes. It is usually long . So you may need to
> > wait for 2 mins for the replication to kick in
> >
> >
> > On Tue, Jun 11, 2013 at 3:21 PM,  wrote:
> >
> > > Hi all,
> > >
> > >
> > >
> > > we have a setup with multiple cores, loaded via DataImportHandlers.
> > >
> > > Works fine so far.
> > >
> > > Now we are trying to get the replication working (for one core so far).
> > > But the automated replication is never happening.
> > >
> > > Manually triggered replication works!
> > >
> > >
> > >
> > > Environment:
> > >
> > > Solr 4.1 (also tried with 4.3)
> > >
> > > App-Server JBoss 4.3.
> > >
> > > Java 1.6
> > >
> > >
> > >
> > > There are two JBoss instances running on different ports on the same
> box
> > > with their own solr.home directories.
> > >
> > >
> > >
> > > Configuration is done like described in the documentation:
> > >
> > >
> > >
> > > 
> > >
> > >  
> > >
> > >name="enable">${de.der.pu.solr.master.enable:false}
> > >
> > >   startup
> > >
> > >   commit
> > >
> > >   optimize
> > >
> > >   stopwords.txt, solrconfig.xml
> > >
> > >  
> > >
> > >  
> > >
> > >  ${de.der.pu.solr.slave.enable:false}
> > >
> > >> > name="masterUrl">http://localhost:30006/solr/${solr.core.name}
> > >
> > >   00:02:00
> > >
> > >  
> > >
> > >   
> > >
> > >
> > >
> > > Basically it looks all fine from the admin-pages.
> > >
> > >
> > >
> > > The polling from the slave is going on but nothing happens.
> > >
> > > We have tried to delete slave index completely and restart both
> servers.
> > > Reimportet the master data several times and so on..
> > >
> > >
> > >
> > > On the masters replication page I see:
> > >
> > > - replication enable: true
> > >
> > > - replicateAfter: commit, startup
> > >
> > > - confFiles: stopwords.txt, solrconfig.xml
> > >
> > >
> > >
> > > On slave side I see:
> > >
> > > -masters version 1370612995391  53   2.56 MB
> > >
> > > -master url:  http://localhost:30006/solr/contacts
> > >
> > > -poling enable: true
> > >
> > >
> > >
> > > And master settings like on master side...
> > >
> > >
> > >
> > > When I enter
> > >
> http://localhost:30006/solr/contacts/replication?command=details&wt=json
> > > &indent=true in the browser the response seems ok:
> > >

Empty solr site

2013-06-12 Thread Ophir Michaeli
Hi,
 
I'm browsing to http://localhost:8983/solr and the UI is empty, no data is
shown, while I got a solr server running with data.
 
Thanks,
Ophir



Re: Not getting results when searching a term from Solr Admin

2013-06-12 Thread coderslay
Apologies Raymond for the Name.

I have tried doing that also and still the same response :(

Regards,
Nasir



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Not-getting-results-when-searching-a-term-from-Solr-Admin-tp4069761p4069880.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Empty solr site

2013-06-12 Thread Alexandre Rafalovitch
That's not really enough info.

However in latest solr people sometimes miss the collection selection
dropdown on bottom left.

Have you tried selecting a collection there?

Regards,
 Alex
On 12 Jun 2013 06:01, "Ophir Michaeli"  wrote:

> Hi,
>
> I'm browsing to http://localhost:8983/solr and the UI is empty, no data is
> shown, while I got a solr server running with data.
>
> Thanks,
> Ophir
>
>


Re: Not getting results when searching a term from Solr Admin

2013-06-12 Thread coderslay
Hi Raymond,

I was playing with it and i specified df=content then i get the results 
 

Can you explain me what happened here?

Regards,
Nasir



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Not-getting-results-when-searching-a-term-from-Solr-Admin-tp4069761p4069888.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Not getting results when searching a term from Solr Admin

2013-06-12 Thread Raymond Wiker
Hmmm did you restart SOLR after changing the schema? And did you try
searching for content:bing (alternatively, setting the df parameter to
"content" (without quotes)?


On Wed, Jun 12, 2013 at 12:12 PM, coderslay wrote:

> Apologies Raymond for the Name.
>
> I have tried doing that also and still the same response :(
>
> Regards,
> Nasir
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Not-getting-results-when-searching-a-term-from-Solr-Admin-tp4069761p4069880.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


SOLR-4872 and LUCENE-2145 (or, how to clean up a Tokenizer)

2013-06-12 Thread Benson Margulies
Could I have some help on the combination of these two? Right now, it
appears that I'm stuck with a finalizer to chase after native
resources in a Tokenizer. Am I missing something?


Re: Not getting results when searching a term from Solr Admin

2013-06-12 Thread Raymond Wiker
It looks like I haven't paid sufficient attention to your earlier messages
- sorry. It is quite clear that "content" contains bing, and you should
have gotten results back if the default search field was content.

It could be that your solrconfig.xml file sets df to a field that does not
contain "bing"?


On Wed, Jun 12, 2013 at 1:29 PM, coderslay wrote:

> Hi Raymond,
>
> I was playing with it and i specified df=content then i get the results
> 
>
> Can you explain me what happened here?
>
> Regards,
> Nasir
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Not-getting-results-when-searching-a-term-from-Solr-Admin-tp4069761p4069888.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Filtering down terms in suggest

2013-06-12 Thread Aloke Ghoshal
Barani - the fq option doesn't work.
Jason - the dynamic field option won't work due to the high number of
groups and users.



On Wed, Jun 12, 2013 at 1:12 AM, Jason Hellman <
jhell...@innoventsolutions.com> wrote:

> Aloke,
>
> If you do not have a factorial problem in the combination of userid and
> groupid (which I can imagine you might) you could consider creating a field
> for each combination (u1g1, u2g2) which can easily be done via dynamic
> fields.  Use CopyField to get data into these various constructs (again,
> easily configured via wildcard patterns) and then send the suggestion query
> to the right field.
>
> Obviously this will get out of hand if you have too many of these...so
> this has limits.
>
> Jason
>
> On Jun 11, 2013, at 8:29 AM, Aloke Ghoshal  wrote:
>
> > Hi,
> >
> > Trying to find a way to filter down the suggested terms set based on the
> > term value of another indexed field?
> >
> > Let's say we have the following documents indexed in Solr:
> > userid:1, groupid:1, content:"alpha beta gamma"
> > userid:2, groupid:1, content:"alternate better garden"
> > userid:3, groupid:2, content:"altruism bent garner"
> >
> > Now a query on (with a dictionary built using terms in the content
> field):
> > q:groupid:1 AND content:al
> >
> > should suggest alpha & alternate, (not altruism, since it has a different
> > groupid).
> >
> > The option to have a separate dictionary per group gets ruled out due to
> > the high number of distinct groups (50K+).
> >
> > Kindly suggest ways to get this working.
> >
> > Thanks,
> > Aloke
>
>


Re: Not getting results when searching a term from Solr Admin

2013-06-12 Thread coderslay
Hi Raymond,

Thanks a lot. It is Appreciated :D

Regards,
Nasir



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Not-getting-results-when-searching-a-term-from-Solr-Admin-tp4069761p4069896.html
Sent from the Solr - User mailing list archive at Nabble.com.


Atomic Update Configurations how to?

2013-06-12 Thread Snubbel
Hello,

we are upgrading from Solr 4.0 to Solr 4.3 because we want to use Atomic
Updates.
But I think something in our Configuration ist not correct yet.

When updating Documents I get the following exception:

org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: 
RunUpdateProcessor has recieved an AddUpdateCommand containing a document
that appears to still contain Atomic document update operations, most likely
because DistributedUpdateProcessorFactory was explicitly disabled from this
updateRequestProcessorChain
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)
at
de.exxcellent.connect.portal.business.solr.AtomicUpdateTest.atomicUpdateTestAddKG2Module(AtomicUpdateTest.java:235)
at
de.exxcellent.connect.portal.business.solr.AtomicUpdateTest.performanceAtomicVSclassicUpdateTest(AtomicUpdateTest.java:45)

What I understand from the Solr Wiki, is that I have to configure Process
Chains correctly. But we don't have any configured yet, do I need one?

Best regards, 
XXNS




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Atomic-Update-Configurations-how-to-tp4069900.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Filtering down terms in suggest

2013-06-12 Thread Jason Hellman
Aloke,

It may be best to simply run a query to populate the suggestion list.  While 
not as fast as the terms component (and suggester offshoots) it can still be 
tuned to be very, very fast.  

In this way, you can generate any fq/q combination required to meet your needs. 
 You can play with wildcard searches, or better yet NGram (EdgeNGram) behavior 
to get the right suggestion data back.

I would suggest an additional core to accomplish this (fed via replication) to 
avoid cache entry collision with your normal queries.

Hope that's useful to you.

Jason

On Jun 12, 2013, at 7:43 AM, Aloke Ghoshal  wrote:

> Barani - the fq option doesn't work.
> Jason - the dynamic field option won't work due to the high number of
> groups and users.
> 
> 
> 
> On Wed, Jun 12, 2013 at 1:12 AM, Jason Hellman <
> jhell...@innoventsolutions.com> wrote:
> 
>> Aloke,
>> 
>> If you do not have a factorial problem in the combination of userid and
>> groupid (which I can imagine you might) you could consider creating a field
>> for each combination (u1g1, u2g2) which can easily be done via dynamic
>> fields.  Use CopyField to get data into these various constructs (again,
>> easily configured via wildcard patterns) and then send the suggestion query
>> to the right field.
>> 
>> Obviously this will get out of hand if you have too many of these...so
>> this has limits.
>> 
>> Jason
>> 
>> On Jun 11, 2013, at 8:29 AM, Aloke Ghoshal  wrote:
>> 
>>> Hi,
>>> 
>>> Trying to find a way to filter down the suggested terms set based on the
>>> term value of another indexed field?
>>> 
>>> Let's say we have the following documents indexed in Solr:
>>> userid:1, groupid:1, content:"alpha beta gamma"
>>> userid:2, groupid:1, content:"alternate better garden"
>>> userid:3, groupid:2, content:"altruism bent garner"
>>> 
>>> Now a query on (with a dictionary built using terms in the content
>> field):
>>> q:groupid:1 AND content:al
>>> 
>>> should suggest alpha & alternate, (not altruism, since it has a different
>>> groupid).
>>> 
>>> The option to have a separate dictionary per group gets ruled out due to
>>> the high number of distinct groups (50K+).
>>> 
>>> Kindly suggest ways to get this working.
>>> 
>>> Thanks,
>>> Aloke
>> 
>> 



RE: Empty solr site

2013-06-12 Thread Ophir Michaeli
I'm running the "2 shards example" at http://wiki.apache.org/solr/SolrCloud.
When browsing to http://localhost:8983/solr (shard 1) I a solr UI screen 
without data, the collections combo box is empty (no collections).
Thanks

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Wednesday, June 12, 2013 2:13 PM
To: solr-user@lucene.apache.org
Subject: Re: Empty solr site

That's not really enough info.

However in latest solr people sometimes miss the collection selection dropdown 
on bottom left.

Have you tried selecting a collection there?

Regards,
 Alex
On 12 Jun 2013 06:01, "Ophir Michaeli"  wrote:

> Hi,
>
> I'm browsing to http://localhost:8983/solr and the UI is empty, no 
> data is shown, while I got a solr server running with data.
>
> Thanks,
> Ophir
>
>





FW: Solr and Lucene

2013-06-12 Thread Ophir Michaeli
Hi,

Which lucene version is used with Solr 4.2.1? And is it possible to open it
by luke? If not by any other tool? Thanks

Thanks


Re: How to Use PageRank like Document Boosting at Solr?

2013-06-12 Thread Michael Della Bitta
Seems like your boost field needs to be indexed.
On Jun 12, 2013 3:49 AM, "Furkan KAMACI"  wrote:

> I use Nutch to index my documents. I have a Nutch aware schema at my Solr
> and there is a field like that:
>
> 
>
> boost holds the epic score of my documents (similar to Google's pagerank).
> How can I boost my queries at Solr side?I followed wiki and tried that:
>
> q={!boost b=boost}text:supervillians
>
> and it says:
>
> can not use FieldCache on a field which is neither indexed nor has doc
> values: boost
>
> However there should be a convenient solution for my purpose. Instead of
> adding something to search query maybe I boost document with a different
> way while indexing, what do you suggest for me?
>


Re: FW: Solr and Lucene

2013-06-12 Thread Rafał Kuć
Hello!

Solr 4.2.1 is using Lucene 4.2.1. Basically Solr and Lucene are
currently using the same numbers after their development was merged.

As far for Luke I think that the last version is using beta or alpha
release of Lucene 4.0. I would try replacing Lucene jar's and see if
it works although I didn't try it.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> Hi,

> Which lucene version is used with Solr 4.2.1? And is it possible to open it
> by luke? If not by any other tool? Thanks

> Thanks



Re: Atomic Update Configurations how to?

2013-06-12 Thread Jack Krupansky


Note that use of the atomic update feature requires that the Solr 
transaction log be enabled in solrconfig using the
 configuration element. For example, as in the standard Solr 
example solrconfig:


 
   ${solr.ulog.dir:}
 

Unless you have a custom distributed update request processor or have 
configured the NoOp Distributing Update Processor, Solr will automatically 
inject the Distributed Update Processor.


Hint: Start with Solr 4.3 schema and solrconfig, do a diff with your config 
files, and then ONLY CAREFULLY merge in any changes to the 4.3 config files. 
In other words, DO NOT just blindly drop in config files.


-- Jack Krupansky

-Original Message- 
From: Snubbel

Sent: Wednesday, June 12, 2013 8:18 AM
To: solr-user@lucene.apache.org
Subject: Atomic Update Configurations how to?

Hello,

we are upgrading from Solr 4.0 to Solr 4.3 because we want to use Atomic
Updates.
But I think something in our Configuration ist not correct yet.

When updating Documents I get the following exception:

org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
RunUpdateProcessor has recieved an AddUpdateCommand containing a document
that appears to still contain Atomic document update operations, most likely
because DistributedUpdateProcessorFactory was explicitly disabled from this
updateRequestProcessorChain
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)
at
de.exxcellent.connect.portal.business.solr.AtomicUpdateTest.atomicUpdateTestAddKG2Module(AtomicUpdateTest.java:235)
at
de.exxcellent.connect.portal.business.solr.AtomicUpdateTest.performanceAtomicVSclassicUpdateTest(AtomicUpdateTest.java:45)

What I understand from the Solr Wiki, is that I have to configure Process
Chains correctly. But we don't have any configured yet, do I need one?

Best regards,
XXNS




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Atomic-Update-Configurations-how-to-tp4069900.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: What is Difference Between Down and Gone At Admin Cloud Page?

2013-06-12 Thread Mark Miller

On Jun 12, 2013, at 3:19 AM, Furkan KAMACI  wrote:

> What is Difference Between Down and Gone At Admin Cloud Page?

If I remember right, Down can mean the node is still actively working towards 
something - eg, without action by you, it might go into recovering or active 
state. Gone means it has given up or disappeared. It's not likely to make 
another state change without your intervention.

- Mark

shardkey

2013-06-12 Thread Joshi, Shital
Hi,

We are using Solr 4.3.0 SolrCloud (5 shards, 10 replicas). I have couple 
questions on shard key. 

1. Looking at the admin GUI, how do I know which field is being used 
for shard key.
2. What is the default shard key used?
3. How do I override the default shard key?

Thanks. 


Solr 4.3 Spatial clustering?

2013-06-12 Thread adfel70
Hi
Is it possible to implement geo clustering in solr 4.3?
Any documentation on this topic?
Have anyone tried it?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-3-Spatial-clustering-tp4069941.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Adding pdf/word file using JSON/XML

2013-06-12 Thread Jack Krupansky
I'm sorry if I came across as aggressive or insulting - I'm only trying to 
dig down to what your actual difficulty is - and you have been making that 
extremely difficult for all of us. You need to help us all out here by more 
clearly expressing what your actual problem is. You will have to excuse the 
rest of us if we are unable to read your mind!


It sounds as if you are an intermediary between your devs and this list. 
That's NOT a very effective communications strategy! You need to either have 
your devs communicate directly on this list, or you need to do a much better 
job of understanding what their actual problem is and then communicate that 
actual problem to this list, plainly and clearly.


TRYING to read your mind (and indirectly your devs' minds as well - not an 
easy task!), and reading between the lines, it is starting to sound as if 
you (or/and your devs) are not clear on how Solr works as a "database".


Core Solr does have full CRUD (Add or Create, Read or Query, Update, and 
Delete), although not in a strict, pure REST sense, that is true.


A "full" update in Solr is the same as an Add - add a new, fresh document, 
and then delete the old document. Some people call this an "Upsert" 
(combination of Update or Insert).


There are really two forms of update (a difficulty in REST): 1) full update 
or "replace" - equal to a delete and an add, and 2) partial or incremental 
update. True REST only has the latter


Core Solr does have support for partial or incremental Update with Atomic 
Updates. Solr will in fact retain the existing data and only update any new 
field values that are supplied on the update request.


SolrCell (Extracting RequestHandler or "/update/extract") is not a core part 
of Solr. It is an add on "contrib" module. It does not have full CRUD - no 
delete, and no partial update, but it does support add and full update.


As someone else already suggested, you can do the work of SolrCell yourself 
by calling Tika directly in your app layer and then sending normal Solr CRUD 
requests.


-- Jack Krupansky

-Original Message- 
From: Roland Everaert

Sent: Wednesday, June 12, 2013 5:21 AM
To: solr-user@lucene.apache.org
Subject: Re: Adding pdf/word file using JSON/XML

1) Being aggressive and insulting is not a way to help people understand
such complex tool or to help people in general.

2) I read again the feature page of Solr and it is stated that the
interface is REST-like and not RESTful as I though in the first place, and
communicate to the devs. And as the devs told me a RESTful interface
doesn't use parameters in the URI/URL, so ii is my mistake. Hence we have
no problem with the interface as it is.

Any way I still have a question regarding the /extract interface. It seems
that every time a file is updated in Solr, the lucene document is recreated
from scratch which means that any extra information we want to be
indexed/stored along the file is erased if the request doesn't contains
them. Is there a parameter that allow changing that behaviour?



Regards,


Roland.


On Tue, Jun 11, 2013 at 4:35 PM, Jack Krupansky 
wrote:



"is it possible to index the file + metadata with a JSON/XML request?"

You still aren't being clear as to what you are really trying to achieve
here. I mean, just write a shell script that does the curl command, or
write a Java program or application layer that uses SolrJ to talk to Solr
and accepts JSON?XML/REST requests.


"It seems that the only way to index a file with some metadata is to build
a
request that would look like the following example that uses curl."

Curl is just a fancy way to do an HTTP request. You can do the same HTTP
request from Java code (or Python or whatever.)


"The developer would like to avoid using parameters in the url to pass
arguments."

Seriously?! What is THAT all about!!  I mean, really, HTTP and URLs and
URL query parameters are part of the heart of the Internet infrastructure!

If this whole thread is merely that you have an IDIOT who can't cope with
passing HTTP URL query parameters, all I can say is... Wow!

But use SolrJ and then at least it doesn't LOOK like they are URL Query
parameters.

Or, maybe this is just a case where the developer WANTS to use SOAP rather
than a REST style of API.

In any case, please clue us in as to what PROBLEM you are really trying to
solve. Just use plain English and avoid getting caught up in what the
solution might be.

The real bottom line is that random application developers should not be
talking directly to Solr anyway - they should be provided with an
"application layer" that has a clean, application-oriented REST API and 
the

gory details of the Solr API would be hidden inside the application layer.


-- Jack Krupansky

-Original Message- From: Roland Everaert
Sent: Tuesday, June 11, 2013 8:48 AM

To: solr-user@lucene.apache.org
Subject: Re: Adding pdf/word file using JSON/XML

We are working on an application that allows some users to add files

Partial update vs full update performance

2013-06-12 Thread adfel70
Hi
As I understand, even if I use partial update, lucene can't really update
documents. Solr will use the stored fields in order to pass the values to
lucene, and a delete,add opeartions will still be performed.

If this is the case is there a performance issue when comparing partial
update to full update?

My documents have dozens of fields, most of them are not stored.
I sometimes need to go through a portion of the documents and modify a
single field.
What I do right now is deleting the portion I want to update, and adding
them with the updated field.
This of course takes a lot of time (I'm talking about ten of millions of
documents).

Should I move to using partial update? will it improve the indexing time at
all? will it improve the indexing time in such extent that I would better be
storing the fields I don't need stored just for the partial update feature?

thanks






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Partial-update-vs-full-update-performance-tp4069948.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Filtering down terms in suggest

2013-06-12 Thread Aloke Ghoshal
Thanks Jason, querying would be a good way to approach this. Though not
NGram, thinking of doing a wildcard based search & use the highlighted text
for suggestions.



On Wed, Jun 12, 2013 at 6:49 PM, Jason Hellman <
jhell...@innoventsolutions.com> wrote:

> Aloke,
>
> It may be best to simply run a query to populate the suggestion list.
>  While not as fast as the terms component (and suggester offshoots) it can
> still be tuned to be very, very fast.
>
> In this way, you can generate any fq/q combination required to meet your
> needs.  You can play with wildcard searches, or better yet NGram
> (EdgeNGram) behavior to get the right suggestion data back.
>
> I would suggest an additional core to accomplish this (fed via
> replication) to avoid cache entry collision with your normal queries.
>
> Hope that's useful to you.
>
> Jason
>
> On Jun 12, 2013, at 7:43 AM, Aloke Ghoshal  wrote:
>
> > Barani - the fq option doesn't work.
> > Jason - the dynamic field option won't work due to the high number of
> > groups and users.
> >
> >
> >
> > On Wed, Jun 12, 2013 at 1:12 AM, Jason Hellman <
> > jhell...@innoventsolutions.com> wrote:
> >
> >> Aloke,
> >>
> >> If you do not have a factorial problem in the combination of userid and
> >> groupid (which I can imagine you might) you could consider creating a
> field
> >> for each combination (u1g1, u2g2) which can easily be done via dynamic
> >> fields.  Use CopyField to get data into these various constructs (again,
> >> easily configured via wildcard patterns) and then send the suggestion
> query
> >> to the right field.
> >>
> >> Obviously this will get out of hand if you have too many of these...so
> >> this has limits.
> >>
> >> Jason
> >>
> >> On Jun 11, 2013, at 8:29 AM, Aloke Ghoshal  wrote:
> >>
> >>> Hi,
> >>>
> >>> Trying to find a way to filter down the suggested terms set based on
> the
> >>> term value of another indexed field?
> >>>
> >>> Let's say we have the following documents indexed in Solr:
> >>> userid:1, groupid:1, content:"alpha beta gamma"
> >>> userid:2, groupid:1, content:"alternate better garden"
> >>> userid:3, groupid:2, content:"altruism bent garner"
> >>>
> >>> Now a query on (with a dictionary built using terms in the content
> >> field):
> >>> q:groupid:1 AND content:al
> >>>
> >>> should suggest alpha & alternate, (not altruism, since it has a
> different
> >>> groupid).
> >>>
> >>> The option to have a separate dictionary per group gets ruled out due
> to
> >>> the high number of distinct groups (50K+).
> >>>
> >>> Kindly suggest ways to get this working.
> >>>
> >>> Thanks,
> >>> Aloke
> >>
> >>
>
>


Any inputs regarding massive indexing to a cluster and search performance?

2013-06-12 Thread adfel70
Hi,
We have a multi-sharded and multi-replicated collection (solr 4.3).

When we perform massive indexing (adding 5 million records with 5k bulks,
commit after each bulk), the search performance is degrades a lot (1 sec
query can turn to 4 sec query).

Any rule of thumb regarding best configuration for this kind of a scenario?

thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Any-inputs-regarding-massive-indexing-to-a-cluster-and-search-performance-tp4069955.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Partial update vs full update performance

2013-06-12 Thread Jack Krupansky

Correct.

Generally, I think most apps will benefit from partial update, especially if 
they have a lot of fields. Otherwise, they will have two round trip requests 
rather than one. Solr does the reading of existing document values more 
efficiently, under the hood, with no need to format for the response and 
parse the incoming (redundant) values.


OTOH, if the client has all the data anyway (maybe because it wants to 
display the data before update), it may be easier to do a full update.


You could do an actual performance test, but I would suggest that 
(generally) partial update will be more efficient than a full update.


And Lucene can do add and delete rather quickly, so that should not be a 
concern for modest to medium size documents, but clearly would be an issue 
for large and very large documents (hundreds of fields or large field 
values.)


-- Jack Krupansky

-Original Message- 
From: adfel70

Sent: Wednesday, June 12, 2013 10:40 AM
To: solr-user@lucene.apache.org
Subject: Partial update vs full update performance

Hi
As I understand, even if I use partial update, lucene can't really update
documents. Solr will use the stored fields in order to pass the values to
lucene, and a delete,add opeartions will still be performed.

If this is the case is there a performance issue when comparing partial
update to full update?

My documents have dozens of fields, most of them are not stored.
I sometimes need to go through a portion of the documents and modify a
single field.
What I do right now is deleting the portion I want to update, and adding
them with the updated field.
This of course takes a lot of time (I'm talking about ten of millions of
documents).

Should I move to using partial update? will it improve the indexing time at
all? will it improve the indexing time in such extent that I would better be
storing the fields I don't need stored just for the partial update feature?

thanks






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Partial-update-vs-full-update-performance-tp4069948.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Any inputs regarding massive indexing to a cluster and search performance?

2013-06-12 Thread Michael Della Bitta
Hi,

My first suggestion is to not commit so often. Use autocommit with maxTime
higher than a minute and openSearcher false. Turn on autoSoftCommit and set
that higher than 10 seconds if you can handle it. Use a higher mergeFactor
than 10, like 35 for example.

After that you're probably going to have to do some profiling to see what
the bottleneck is. Iostat helps a lot with that.


Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Wed, Jun 12, 2013 at 10:50 AM, adfel70  wrote:

> Hi,
> We have a multi-sharded and multi-replicated collection (solr 4.3).
>
> When we perform massive indexing (adding 5 million records with 5k bulks,
> commit after each bulk), the search performance is degrades a lot (1 sec
> query can turn to 4 sec query).
>
> Any rule of thumb regarding best configuration for this kind of a scenario?
>
> thanks.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Any-inputs-regarding-massive-indexing-to-a-cluster-and-search-performance-tp4069955.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Any inputs regarding massive indexing to a cluster and search performance?

2013-06-12 Thread Shawn Heisey
On 6/12/2013 8:50 AM, adfel70 wrote:
> We have a multi-sharded and multi-replicated collection (solr 4.3).
> 
> When we perform massive indexing (adding 5 million records with 5k bulks,
> commit after each bulk), the search performance is degrades a lot (1 sec
> query can turn to 4 sec query).
> 
> Any rule of thumb regarding best configuration for this kind of a scenario?

If it's important that your documents be visible each time you add 5000
of them, then I would switch to soft commits.  If you don't need them to
be visible until the end, then I would not send explicit commits at all
until the very end.  A middle ground - only do a soft commit after N
batches.  If N=20, that's every 100k docs.

Regardless of which choice you make in the previous paragraph, doing
periodic hard commits is very important when you have the updateLog
turned on, which is required for SolrCloud.  For that reason, I would
add autoCommit into your config with openSearcher set to false.  This
will flush the data to disk, but will not open a new searcher object, so
changes from that commit will not be visible to queries.  A hard commit
with openSearcher=false happens pretty fast.

http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup

Exactly what to use for maxDocs and maxTime will depend on your setup.
You want to pick values large enough so commits aren't happening
constantly, but small enough so that your transaction logs don't get huge.

The rest of the wiki page that I linked has general information about
Solr performance that might be useful to you.

Thanks,
Shawn



Re: Partial update vs full update performance

2013-06-12 Thread Upayavira
My question would be, why are you updating 10m documents? Is it because
of denormalised fields? E.g. one system I have needs to reindex all data
for a publication when that publication switches between active and
inactive. 

If this is the case, you can perhaps achieve the same using joins. Store
the publications, and their status, in another core. Then, to find
documents for active publications could be:

q=harry potter&fq={!join fromIndex=pubs from=pubID
to=pubID}status:active

This would find documents containing the terms 'harry potter' which are
associated with active publications.

Changing the status of a publication would require a single document in
the 'pubs' core to be changed, rather than re-indexing all documents.

Does this hit what you are trying to achieve?

Upayavira


On Wed, Jun 12, 2013, at 03:51 PM, Jack Krupansky wrote:
> Correct.
> 
> Generally, I think most apps will benefit from partial update, especially
> if 
> they have a lot of fields. Otherwise, they will have two round trip
> requests 
> rather than one. Solr does the reading of existing document values more 
> efficiently, under the hood, with no need to format for the response and 
> parse the incoming (redundant) values.
> 
> OTOH, if the client has all the data anyway (maybe because it wants to 
> display the data before update), it may be easier to do a full update.
> 
> You could do an actual performance test, but I would suggest that 
> (generally) partial update will be more efficient than a full update.
> 
> And Lucene can do add and delete rather quickly, so that should not be a 
> concern for modest to medium size documents, but clearly would be an
> issue 
> for large and very large documents (hundreds of fields or large field 
> values.)
> 
> -- Jack Krupansky
> 
> -Original Message- 
> From: adfel70
> Sent: Wednesday, June 12, 2013 10:40 AM
> To: solr-user@lucene.apache.org
> Subject: Partial update vs full update performance
> 
> Hi
> As I understand, even if I use partial update, lucene can't really update
> documents. Solr will use the stored fields in order to pass the values to
> lucene, and a delete,add opeartions will still be performed.
> 
> If this is the case is there a performance issue when comparing partial
> update to full update?
> 
> My documents have dozens of fields, most of them are not stored.
> I sometimes need to go through a portion of the documents and modify a
> single field.
> What I do right now is deleting the portion I want to update, and adding
> them with the updated field.
> This of course takes a lot of time (I'm talking about ten of millions of
> documents).
> 
> Should I move to using partial update? will it improve the indexing time
> at
> all? will it improve the indexing time in such extent that I would better
> be
> storing the fields I don't need stored just for the partial update
> feature?
> 
> thanks
> 
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Partial-update-vs-full-update-performance-tp4069948.html
> Sent from the Solr - User mailing list archive at Nabble.com. 
> 


Re: Partial update vs full update performance

2013-06-12 Thread adfel70
1. To support partial updates, I must have all the fields stored (most of
which I don't need stored)
Wouldn't I suffer in query perforemnce if I store all  these fields?

2. Can you elaborate on the large fields issue?
Why does it matter if the fields are large in the context of partial
updates?
One way or another, lucene will index the field..


Jack Krupansky-2 wrote
> Correct.
> 
> Generally, I think most apps will benefit from partial update, especially
> if 
> they have a lot of fields. Otherwise, they will have two round trip
> requests 
> rather than one. Solr does the reading of existing document values more 
> efficiently, under the hood, with no need to format for the response and 
> parse the incoming (redundant) values.
> 
> OTOH, if the client has all the data anyway (maybe because it wants to 
> display the data before update), it may be easier to do a full update.
> 
> You could do an actual performance test, but I would suggest that 
> (generally) partial update will be more efficient than a full update.
> 
> And Lucene can do add and delete rather quickly, so that should not be a 
> concern for modest to medium size documents, but clearly would be an issue 
> for large and very large documents (hundreds of fields or large field 
> values.)
> 
> -- Jack Krupansky
> 
> -Original Message- 
> From: adfel70
> Sent: Wednesday, June 12, 2013 10:40 AM
> To: 

> solr-user@.apache

> Subject: Partial update vs full update performance
> 
> Hi
> As I understand, even if I use partial update, lucene can't really update
> documents. Solr will use the stored fields in order to pass the values to
> lucene, and a delete,add opeartions will still be performed.
> 
> If this is the case is there a performance issue when comparing partial
> update to full update?
> 
> My documents have dozens of fields, most of them are not stored.
> I sometimes need to go through a portion of the documents and modify a
> single field.
> What I do right now is deleting the portion I want to update, and adding
> them with the updated field.
> This of course takes a lot of time (I'm talking about ten of millions of
> documents).
> 
> Should I move to using partial update? will it improve the indexing time
> at
> all? will it improve the indexing time in such extent that I would better
> be
> storing the fields I don't need stored just for the partial update
> feature?
> 
> thanks
> 
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Partial-update-vs-full-update-performance-tp4069948.html
> Sent from the Solr - User mailing list archive at Nabble.com.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Partial-update-vs-full-update-performance-tp4069948p4069973.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Partial update vs full update performance

2013-06-12 Thread adfel70
Yes it is.
But in my case, these are metadata fields, and I need them to be searchable,
facetable, sortable in the context of the main text fields.
Will I be able to achieve that if I index them in another core? 


Upayavira wrote
> My question would be, why are you updating 10m documents? Is it because
> of denormalised fields? E.g. one system I have needs to reindex all data
> for a publication when that publication switches between active and
> inactive. 
> 
> If this is the case, you can perhaps achieve the same using joins. Store
> the publications, and their status, in another core. Then, to find
> documents for active publications could be:
> 
> q=harry potter&fq={!join fromIndex=pubs from=pubID
> to=pubID}status:active
> 
> This would find documents containing the terms 'harry potter' which are
> associated with active publications.
> 
> Changing the status of a publication would require a single document in
> the 'pubs' core to be changed, rather than re-indexing all documents.
> 
> Does this hit what you are trying to achieve?
> 
> Upayavira
> 
> 
> On Wed, Jun 12, 2013, at 03:51 PM, Jack Krupansky wrote:
>> Correct.
>> 
>> Generally, I think most apps will benefit from partial update, especially
>> if 
>> they have a lot of fields. Otherwise, they will have two round trip
>> requests 
>> rather than one. Solr does the reading of existing document values more 
>> efficiently, under the hood, with no need to format for the response and 
>> parse the incoming (redundant) values.
>> 
>> OTOH, if the client has all the data anyway (maybe because it wants to 
>> display the data before update), it may be easier to do a full update.
>> 
>> You could do an actual performance test, but I would suggest that 
>> (generally) partial update will be more efficient than a full update.
>> 
>> And Lucene can do add and delete rather quickly, so that should not be a 
>> concern for modest to medium size documents, but clearly would be an
>> issue 
>> for large and very large documents (hundreds of fields or large field 
>> values.)
>> 
>> -- Jack Krupansky
>> 
>> -Original Message- 
>> From: adfel70
>> Sent: Wednesday, June 12, 2013 10:40 AM
>> To: 

> solr-user@.apache

>> Subject: Partial update vs full update performance
>> 
>> Hi
>> As I understand, even if I use partial update, lucene can't really update
>> documents. Solr will use the stored fields in order to pass the values to
>> lucene, and a delete,add opeartions will still be performed.
>> 
>> If this is the case is there a performance issue when comparing partial
>> update to full update?
>> 
>> My documents have dozens of fields, most of them are not stored.
>> I sometimes need to go through a portion of the documents and modify a
>> single field.
>> What I do right now is deleting the portion I want to update, and adding
>> them with the updated field.
>> This of course takes a lot of time (I'm talking about ten of millions of
>> documents).
>> 
>> Should I move to using partial update? will it improve the indexing time
>> at
>> all? will it improve the indexing time in such extent that I would better
>> be
>> storing the fields I don't need stored just for the partial update
>> feature?
>> 
>> thanks
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Partial-update-vs-full-update-performance-tp4069948.html
>> Sent from the Solr - User mailing list archive at Nabble.com. 
>>





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Partial-update-vs-full-update-performance-tp4069948p4069974.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Partial update vs full update performance

2013-06-12 Thread Upayavira


On Wed, Jun 12, 2013, at 04:54 PM, adfel70 wrote:
> Yes it is.
> But in my case, these are metadata fields, and I need them to be
> searchable,
> facetable, sortable in the context of the main text fields.
> Will I be able to achieve that if I index them in another core? 

Unfortunately, at this point, you can only search on them when they are
in another core, you cannot facet or sort, meaning join queries won't
work for you.

Upayavira


DIH Question

2013-06-12 Thread PeriS
When creating a new record in the db, and running the deal-import command, i m 
not seeing the new record being indexed. Is there some configuration I need to 
set? The use is the db already has records loaded and I would like to index new 
records. Whats the process? Any ideas please?

Thanks
-Peri.S



Re: Partial update vs full update performance

2013-06-12 Thread Shawn Heisey
On 6/12/2013 9:50 AM, adfel70 wrote:
> 1. To support partial updates, I must have all the fields stored (most of
> which I don't need stored)
> Wouldn't I suffer in query perforemnce if I store all  these fields?
> 
> 2. Can you elaborate on the large fields issue?
> Why does it matter if the fields are large in the context of partial
> updates?
> One way or another, lucene will index the field..

Storing lots of fields does incur performance overhead due to the stored
fields compression that was added in Lucene/Solr 4.1.0.  The overhead
can be particularly bad for large fields.  You might be able to avoid
some of the problem for the query side by using the fl parameter, but I
could be wrong there, because I think that multiple fields can share the
same compressed block.  I'm not very familiar with the Lucene internals.

LUCENE-4995 will improve the situation when 4.4 comes out, but there is
still overhead.

Currently it is not possible to turn compression off.  For most typical
use cases, the compression speeds things up, but there are some
situations where it makes things worse.

Thanks,
Shawn



Re: Partial update vs full update performance

2013-06-12 Thread adfel70
Any reason why not index these metadata fields in the same core?
Would I be able to sort, facet with join queries  if the joined docs are in
the same core?


Upayavira wrote
> On Wed, Jun 12, 2013, at 04:54 PM, adfel70 wrote:
>> Yes it is.
>> But in my case, these are metadata fields, and I need them to be
>> searchable,
>> facetable, sortable in the context of the main text fields.
>> Will I be able to achieve that if I index them in another core? 
> 
> Unfortunately, at this point, you can only search on them when they are
> in another core, you cannot facet or sort, meaning join queries won't
> work for you.
> 
> Upayavira





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Partial-update-vs-full-update-performance-tp4069948p4069981.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH Question

2013-06-12 Thread Gora Mohanty
On 12 June 2013 21:34, PeriS  wrote:
> When creating a new record in the db, and running the deal-import command, i 
> m not seeing the new record being indexed. Is there some configuration I need 
> to set? The use is the db already has records loaded and I would like to 
> index new records. Whats the process? Any ideas please?

Please provide sufficient details as we do not have access
to your server, and there could be a million things that are
wrong.

Start by sharing your DIH configuration file, the exact URL
that you are using for doing the delta import, and the message
that you get in the browser when the delta import completes: This
will have details about how many documents were picked up.
Also, how are you checking that the new document was not
indexed.

Regards,
Gora


Re: Partial update vs full update performance

2013-06-12 Thread Jack Krupansky

Yes, you need to have all the fields stored to do a partial update.

Generally, not storing field values causes all sorts of headaches that far 
outweigh the modest benefit in memory savings.


Generally, make everything stored - unless you have specific and VERY 
COMPELLING need not to. Back in the early days of Lucene and Solr memory use 
was much more compelling. Now, not so much. And even if memory is an issue, 
the downside of not storing all values seems much more likely to overwhelm 
the benefits.


Sure, there are some apps where you may not want to store much if anything 
besides the key (I recall one presentation at Lucene Revolution in San 
Diego, and DataStax Enterprise does this because all the data is stored in 
Cassandra already), but generally apps would be better off biting the bullet 
and throwing memory at the problem.


And DocValues are an alternative if heap space is a critical issue.

2. Large field values are simply a potential issue since they are a lot of 
bytes to be retrieved and then re-stored.


-- Jack Krupansky

-Original Message- 
From: adfel70

Sent: Wednesday, June 12, 2013 11:50 AM
To: solr-user@lucene.apache.org
Subject: Re: Partial update vs full update performance

1. To support partial updates, I must have all the fields stored (most of
which I don't need stored)
Wouldn't I suffer in query perforemnce if I store all  these fields?

2. Can you elaborate on the large fields issue?
Why does it matter if the fields are large in the context of partial
updates?
One way or another, lucene will index the field..


Jack Krupansky-2 wrote

Correct.

Generally, I think most apps will benefit from partial update, especially
if
they have a lot of fields. Otherwise, they will have two round trip
requests
rather than one. Solr does the reading of existing document values more
efficiently, under the hood, with no need to format for the response and
parse the incoming (redundant) values.

OTOH, if the client has all the data anyway (maybe because it wants to
display the data before update), it may be easier to do a full update.

You could do an actual performance test, but I would suggest that
(generally) partial update will be more efficient than a full update.

And Lucene can do add and delete rather quickly, so that should not be a
concern for modest to medium size documents, but clearly would be an issue
for large and very large documents (hundreds of fields or large field
values.)

-- Jack Krupansky

-Original Message- 
From: adfel70

Sent: Wednesday, June 12, 2013 10:40 AM
To:



solr-user@.apache



Subject: Partial update vs full update performance

Hi
As I understand, even if I use partial update, lucene can't really update
documents. Solr will use the stored fields in order to pass the values to
lucene, and a delete,add opeartions will still be performed.

If this is the case is there a performance issue when comparing partial
update to full update?

My documents have dozens of fields, most of them are not stored.
I sometimes need to go through a portion of the documents and modify a
single field.
What I do right now is deleting the portion I want to update, and adding
them with the updated field.
This of course takes a lot of time (I'm talking about ten of millions of
documents).

Should I move to using partial update? will it improve the indexing time
at
all? will it improve the indexing time in such extent that I would better
be
storing the fields I don't need stored just for the partial update
feature?

thanks






--
View this message in context:
http://lucene.472066.n3.nabble.com/Partial-update-vs-full-update-performance-tp4069948.html
Sent from the Solr - User mailing list archive at Nabble.com.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Partial-update-vs-full-update-performance-tp4069948p4069973.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: ConstantScoreQuery

2013-06-12 Thread Subashini Soundararajan
Correction: The query was price_c:1, can someone please explain ?

Thanks
Subashini
On Tuesday, June 11, 2013, Subashini Soundararajan wrote:

> Hi,
>
> I have imported the money.xml doc in lucene -
> https://github.com/normann/apache-solr/blob/master/example/exampledocs/money.xml
>
> I tried the query: price_1:1 and got back a result containing only USD as
> the search hit. But the money.xml has Euro, pound and NOK too.
> Do we know why they are not returned as part of the result?
>
> I tried this via the solr admin interface.When I analyzed the query it
> showed the use of ConstantScoreQuery in the background.
>
> Thanks,
> Subashini
>
>
> --
> Thanks,
> Subashini
>
> Imagination is more important than knowledge
> - Einstein
>
>

-- 
Sent from Gmail Mobile


Re: Solr 4.3 Spatial clustering?

2013-06-12 Thread bbarani
check this link..

http://stackoverflow.com/questions/11319465/geoclusters-in-solr



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-3-Spatial-clustering-tp4069941p4069986.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: shardkey

2013-06-12 Thread bbarani
I suppose you can implement custom hashing by using "_shard_" field. I am not
sure on this, but I have come across this approach sometime back..

At query time, you can specify "shard.keys" parameter...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/shardkey-tp4069940p4069990.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH Question

2013-06-12 Thread PeriS
1. Current num records in db: 4
2. Created a new record and called delta-import using the following url; 
localhost:8983/dataimport?command=delta-import
SOLR Log here : http://apaste.info/gF3N
3. When i tried checking the status from the browser - logs are here: 
http://apaste.info/gxDF

db-import-config.xml is here : http://apaste.info/3t0K

Thanks
-Peri


On Jun 12, 2013, at 12:11 PM, Gora Mohanty  wrote:

> Please provide sufficient details as we do not have access
> to your server, and there could be a million things that are
> wrong.
> 
> Start by sharing your DIH configuration file, the exact URL
> that you are using for doing the delta import, and the message
> that you get in the browser when the delta import completes: This
> will have details about how many documents were picked up.
> Also, how are you checking that the new document was not
> indexed.
> 
> Regards,
> Gora



What is wrong with this blank query?

2013-06-12 Thread Shankar Sundararaju
http://localhost:8983/solr/doc1/select?q=text:()&debugQuery=on&defType=lucene

I get this error:
org.apache.solr.search.SyntaxError: Cannot parse 'text:()': Encountered "
")" ") "" at line 1, column 6. Was expecting one of:  ... "+" ... "-"
...  ... "(" ... "*" ...  ...  ...  ...
 ...  ... "[" ... "{" ...  ...  ...
 ... "*" ...

Why does lucene query parser support blank field? I know the work around is
-text:[* TO *]

But I like text:() as it is more logical. Is there any reason?

Thanks
-Shankar


Re: Filtering down terms in suggest

2013-06-12 Thread bbarani
I would suggest you to take the suggested string and create another query to
solr along with the filter parameter. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Filtering-down-terms-in-suggest-tp4069627p4069997.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: What is wrong with this blank query?

2013-06-12 Thread bbarani
Not sure what you are trying to achieve.

I assume you are trying to return the documents that doesn't contain any
value in a particular field..

You can use the below query for that..

http://localhost:8983/solr/doc1/select?q=-text:*&debugQuery=on&defType=lucene



--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-is-wrong-with-this-blank-query-tp4069995p4069998.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: What is wrong with this blank query?

2013-06-12 Thread Chris Hostetter

: Subject: What is wrong with this blank query?
: 
: http://localhost:8983/solr/doc1/select?q=text:()&debugQuery=on&defType=lucene

that's not a "blank" query ... when you use the parens you are telling the 
query parser you want ot create a BooleanQuery object, and then you aren't 
including any clauses in that BooleanQuery object, which is invalid.

It might be more clear why this makes no sense if you remember that 
writing "text:()" is equivilent to writing "()" -- that "text:" prefix 
just instructs the parser that "for the purposes of parsing this boolean 
query, treat "text" as the default field name.

that's why these queries are equivilent...

foo_s:(bar_s:x y)
(bar_s:x foo_s:y)

the fact that you have "text:" in front of the parens doesn't change the 
fact that an empty boolean query makes no sense.


-Hoss


Re: What is wrong with this blank query?

2013-06-12 Thread Jack Krupansky
Try answering this question: What do you imagine the semantics would be for 
that query. I mean, what kind of results do you think it should return that 
would be obvious and apparent and useful for an average application. Solr is 
simply telling you that it sure looks mighty suspicious!


Why would you intentionally send a request to Solr to get... no data? I 
mean, this sounds like another "XY Problem" - an apparent proposed solution, 
but to what problem? Tell us the problem you are trying to solve.


-- Jack Krupansky

-Original Message- 
From: Shankar Sundararaju

Sent: Wednesday, June 12, 2013 1:18 PM
To: solr-user@lucene.apache.org
Subject: What is wrong with this blank query?

http://localhost:8983/solr/doc1/select?q=text:()&debugQuery=on&defType=lucene

I get this error:
org.apache.solr.search.SyntaxError: Cannot parse 'text:()': Encountered "
")" ") "" at line 1, column 6. Was expecting one of:  ... "+" ... "-"
...  ... "(" ... "*" ...  ...  ...  ...
 ...  ... "[" ... "{" ...  ...  ...
 ... "*" ...

Why does lucene query parser support blank field? I know the work around is
-text:[* TO *]

But I like text:() as it is more logical. Is there any reason?

Thanks
-Shankar 



Re: What is Difference Between Down and Gone At Admin Cloud Page?

2013-06-12 Thread Stefan Matheis
The ticket for the legend is SOLR-3915, the definition came up in SOLR-3174:

https://issues.apache.org/jira/browse/SOLR-3174?focusedCommentId=13255923&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13255923
 


On Wednesday, June 12, 2013 at 3:54 PM, Mark Miller wrote:

> 
> On Jun 12, 2013, at 3:19 AM, Furkan KAMACI  (mailto:furkankam...@gmail.com)> wrote:
> 
> > What is Difference Between Down and Gone At Admin Cloud Page?
> 
> If I remember right, Down can mean the node is still actively working towards 
> something - eg, without action by you, it might go into recovering or active 
> state. Gone means it has given up or disappeared. It's not likely to make 
> another state change without your intervention.
> 
> - Mark 



Re: What is wrong with this blank query?

2013-06-12 Thread Jack Krupansky
Try the LucidWorks Search query parser - it should handle this without 
complaint, since an empty query can be omitted by the parser with no ill 
effect. Solr and Lucene are simply being overly picky.


-- Jack Krupansky

-Original Message- 
From: Shankar Sundararaju

Sent: Wednesday, June 12, 2013 1:18 PM
To: solr-user@lucene.apache.org
Subject: What is wrong with this blank query?

http://localhost:8983/solr/doc1/select?q=text:()&debugQuery=on&defType=lucene

I get this error:
org.apache.solr.search.SyntaxError: Cannot parse 'text:()': Encountered "
")" ") "" at line 1, column 6. Was expecting one of:  ... "+" ... "-"
...  ... "(" ... "*" ...  ...  ...  ...
 ...  ... "[" ... "{" ...  ...  ...
 ... "*" ...

Why does lucene query parser support blank field? I know the work around is
-text:[* TO *]

But I like text:() as it is more logical. Is there any reason?

Thanks
-Shankar 



Re: DIH Question

2013-06-12 Thread PeriS
What would be the process to update a new record in an existing db using DIH?

On Jun 12, 2013, at 1:06 PM, PeriS  wrote:

> 1. Current num records in db: 4
> 2. Created a new record and called delta-import using the following url; 
> localhost:8983/dataimport?command=delta-import
> SOLR Log here : http://apaste.info/gF3N
> 3. When i tried checking the status from the browser - logs are here: 
> http://apaste.info/gxDF
> 
> db-import-config.xml is here : http://apaste.info/3t0K
> 
> Thanks
> -Peri
> 
> 
> On Jun 12, 2013, at 12:11 PM, Gora Mohanty  wrote:
> 
>> Please provide sufficient details as we do not have access
>> to your server, and there could be a million things that are
>> wrong.
>> 
>> Start by sharing your DIH configuration file, the exact URL
>> that you are using for doing the delta import, and the message
>> that you get in the browser when the delta import completes: This
>> will have details about how many documents were picked up.
>> Also, how are you checking that the new document was not
>> indexed.
>> 
>> Regards,
>> Gora
> 



Re: shardkey

2013-06-12 Thread Rishi Easwaran
>From my understanding.
In SOLR cloud the CompositeIdDocRouter uses HashbasedDocRouter.
CompositeId router is default if your numShards>1 on collection creation.
CompositeId router generates an hash using the uniqueKey defined in your 
schema.xml to route your documents to a dedicated shard.

You can use select?q=xyz&shard.keys=uniquekey to focus your search to hit only 
the shard that has your shard.key  

 

 Thanks,

Rishi.

 

-Original Message-
From: Joshi, Shital 
To: 'solr-user@lucene.apache.org' 
Sent: Wed, Jun 12, 2013 10:01 am
Subject: shardkey


Hi,

We are using Solr 4.3.0 SolrCloud (5 shards, 10 replicas). I have couple 
questions on shard key. 

1. Looking at the admin GUI, how do I know which field is being used 
for shard 
key.
2. What is the default shard key used?
3. How do I override the default shard key?

Thanks. 

 


Configuring Solr to connect to a SQL server instance

2013-06-12 Thread Daniel Mosesson
I currently have the following:

I am running the example-DIH instance of solr, and it works fine.
I then change the data-db-confix.xml file to make the dataSource the following:



As far as I can tell from the SQL profiler, it is never able to log in, or even 
attempt to connect.

I did get the jdbc  .jar file and sqljdbc_auth.dll file, and loaded them into 
example-DIH\solr\db\lib

The error I am getting from the attempted import is as follows:
Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to 
execute query: select * from temp_ip_solr_test Processing Document # 1

What could I be doing wrong?
Solr version 4.3



**
This e-mail message and any attachments are confidential. Dissemination, 
distribution or copying of this e-mail or any attachments by anyone other than 
the intended recipient is prohibited. If you are not the intended recipient, 
please notify Ipreo immediately by replying to this e-mail, and destroy all 
copies of this e-mail and any attachments. Thank you!
**


Re: Configuring Solr to connect to a SQL server instance

2013-06-12 Thread bbarani
The below config file works fine with sql server. Make sure you are using the
correct database / server name.


 
   
   
 
  

 
 
   
 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Configuring-Solr-to-connect-to-a-SQL-server-instance-tp4070005p4070010.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: FW: Solr and Lucene

2013-06-12 Thread heikki
This link might be useful too:
http://www.semanticmetadata.net/2013/04/11/luke-4-2-binaries/.

Kind regards,
Heikki Doeleman



On Wed, Jun 12, 2013 at 3:45 PM, Rafał Kuć  wrote:

> Hello!
>
> Solr 4.2.1 is using Lucene 4.2.1. Basically Solr and Lucene are
> currently using the same numbers after their development was merged.
>
> As far for Luke I think that the last version is using beta or alpha
> release of Lucene 4.0. I would try replacing Lucene jar's and see if
> it works although I didn't try it.
>
> --
> Regards,
>  Rafał Kuć
>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
>
> > Hi,
>
> > Which lucene version is used with Solr 4.2.1? And is it possible to open
> it
> > by luke? If not by any other tool? Thanks
>
> > Thanks
>
>


RE: shardkey

2013-06-12 Thread James Thomas
This page has some good information on custom document routing: 
http://docs.lucidworks.com/display/solr/Shards+and+Indexing+Data+in+SolrCloud



-Original Message-
From: Rishi Easwaran [mailto:rishi.easwa...@aol.com] 
Sent: Wednesday, June 12, 2013 1:40 PM
To: solr-user@lucene.apache.org
Subject: Re: shardkey

>From my understanding.
In SOLR cloud the CompositeIdDocRouter uses HashbasedDocRouter.
CompositeId router is default if your numShards>1 on collection creation.
CompositeId router generates an hash using the uniqueKey defined in your 
schema.xml to route your documents to a dedicated shard.

You can use select?q=xyz&shard.keys=uniquekey to focus your search to hit only 
the shard that has your shard.key  

 

 Thanks,

Rishi.

 

-Original Message-
From: Joshi, Shital 
To: 'solr-user@lucene.apache.org' 
Sent: Wed, Jun 12, 2013 10:01 am
Subject: shardkey


Hi,

We are using Solr 4.3.0 SolrCloud (5 shards, 10 replicas). I have couple 
questions on shard key. 

1. Looking at the admin GUI, how do I know which field is being used 
for shard key.
2. What is the default shard key used?
3. How do I override the default shard key?

Thanks. 

 


Solr Shards and ZooKeeper

2013-06-12 Thread Kalyan Kuram
Hi allI am trying to configure external zookeeper with solr instances which has 
to have 2 shards.I tried the introductory solrcloud wiki page and lucidworks 
solrcloud page it works just fine(embedded zookeeper),The problem i have is 
start solr with 2 shards when i have external zookeeper,i cant get solr to 
start with 2 shards 
Steps followed1.start zookeeper
2.Start First solr instance with args nohup java 
-Dbootstrap_confdir=./solr/Articles/conf -Dcollection.configName=Articles 
-DzkHost=dev-core-solr1:2181 -DnumShards=2 -jar start.jar &
3.Start the second solr instancenohup  java -DzkHost=dev-core-solr1:2181 -jar 
start.jar   &   

   
And when i navigate to the cloud page i see shard1 connected to 2 solr 
instances instead of shard1 connected to solr1instance1 and shard2 to 
solrinstance2
Behaviour is not the same when i start embedded zookeeper from solr,i see 
shard1 connected to solrinstance1 and shard2 connected to solrinstance2
Am i doing something wrong or have i missed any steps.Please help
Kalyan

Sorting by field is slow

2013-06-12 Thread Shane Perry
In upgrading from Solr 3.6.1 to 4.3.0, our query response time has
increased exponentially.  After testing in 4.3.0 it appears the same query
(with 1 matching document) returns after 100 ms without sorting but takes 1
minute when sorting by a text field.  I've looked around but haven't yet
found a reason for the degradation.  Can someone give me some insight or
point me in the right direction for resolving this?  In most cases, I can
change my code to do client-side sorting but I do have a couple of
situations where pagination prevents client-side sorting.  Any help would
be greatly appreciated.

Thanks,

Shane


Dynamically create new fields

2013-06-12 Thread Van Tassell, Kristian
We have a need to dynamically create new fields. These fields would mostly be 
used for new facet types.

While I could modify, as needed, the schema, that presents some deployment 
issues (such as needing to restart the Solr service). Whereas, something such 
as elasticsearch's schema-free model, where fields do not need to be defined 
ahead of time seems more like what we need.

I've tried to come up with alternatives, such as a predefined dynamic field 
that could take name:value pairs separated by some type of delimeter, but given 
that the data type may be different per new definition, for one, along with 
that being just a clunky design overall, I can't seem to figure out a way to do 
this.

Does anyone have any ideas or can you point me to a write-up or documentation 
that addresses this?

Thanks,
Kristian


Re: Sorting by field is slow

2013-06-12 Thread bbarani
http://wiki.apache.org/solr/SolrPerformanceFactors

If you do a lot of field based sorting, it is advantageous to add explicitly
warming queries to the "newSearcher" and "firstSearcher" event listeners in
your solrconfig which sort on those fields, so the FieldCache is populated
prior to any queries being executed by your users.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-by-field-is-slow-tp4070026p4070028.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dynamically create new fields

2013-06-12 Thread bbarani
Dynamically adding fields to schema is yet to get released..

https://issues.apache.org/jira/browse/SOLR-3251

We used dynamic field and copy field for dynamically creating facets...

We had too many dynamic fields (retrieved from a database table) and we had
to make sure that facets exists for the new fields..

schema.xml example: 

 

 

  

This way we were able to access the facets using the fieldname followed by
keyword 'Facet'

For ex: name field has facet field nameFacet




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dynamically-create-new-fields-tp4070029p4070031.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dynamically create new fields

2013-06-12 Thread Chris Hostetter

: Dynamically adding fields to schema is yet to get released..
: 
: https://issues.apache.org/jira/browse/SOLR-3251

Just to clarify...

*explicitly* adding fields dynamicly based on client commands has been 
implimented and will be included in Solr 4.4

*implicitly* adding fields dynamically based on what documents are added 
to the index is a feature that sarowe is still currently working on...

https://issues.apache.org/jira/browse/SOLR-3250


-Hoss


Need help with search in multiple indexes

2013-06-12 Thread smanad
Hi, 

I am thinking of using Solr to implement Search on our site. Here is my use
case, 
1. We will have multiple 4-5 indexes based on different data
types/structures and data will be indexed into these by several processes,
like cron, on demand, thru message queue applications, etc. 
2. A single web service needs to search across all these indexes and return
results. 

I am thinking of using Solr 4.2.1 or may be 4.3 with single instance -
multicore setup. 
I read about distributed search and I believe I should be able to search
across multiple indices using shards parameters. However in my case, all
shards will be on same host/port but with different core name. 

Is my understanding correct? Or is there any better alternative to this
approach?

Please suggest. 
Thanks, 
-Manasi



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-help-with-search-in-multiple-indexes-tp4070040.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Dynamically create new fields

2013-06-12 Thread Van Tassell, Kristian
Great, thank you!

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Wednesday, June 12, 2013 2:27 PM
To: solr-user@lucene.apache.org
Subject: Re: Dynamically create new fields


: Dynamically adding fields to schema is yet to get released..
: 
: https://issues.apache.org/jira/browse/SOLR-3251

Just to clarify...

*explicitly* adding fields dynamicly based on client commands has been 
implimented and will be included in Solr 4.4

*implicitly* adding fields dynamically based on what documents are added to the 
index is a feature that sarowe is still currently working on...

https://issues.apache.org/jira/browse/SOLR-3250


-Hoss


Re: Need help with search in multiple indexes

2013-06-12 Thread Michael Della Bitta
Manasi,

Everything hinges on these indexes having similar enough schema that they
can be represented as a union of all the fields from each type, where most
of the searched data is common to all types. If so, you have a few options
for querying them all together... distributed search, creating one large
index and adding a type field, etc.

If, however, your data is heterogeneous enough that the schemas are not
really comparable, you're probably stuck coordinating the results
externally.


Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Wed, Jun 12, 2013 at 3:55 PM, smanad  wrote:

> Hi,
>
> I am thinking of using Solr to implement Search on our site. Here is my use
> case,
> 1. We will have multiple 4-5 indexes based on different data
> types/structures and data will be indexed into these by several processes,
> like cron, on demand, thru message queue applications, etc.
> 2. A single web service needs to search across all these indexes and return
> results.
>
> I am thinking of using Solr 4.2.1 or may be 4.3 with single instance -
> multicore setup.
> I read about distributed search and I believe I should be able to search
> across multiple indices using shards parameters. However in my case, all
> shards will be on same host/port but with different core name.
>
> Is my understanding correct? Or is there any better alternative to this
> approach?
>
> Please suggest.
> Thanks,
> -Manasi
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Need-help-with-search-in-multiple-indexes-tp4070040.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Need help with search in multiple indexes

2013-06-12 Thread smanad
Thanks for the reply Michael. 

In some cases schema is similar but not all of them. So lets go with
assumption schema NOT being similar.

I am not quite sure what you mean by "you're probably stuck coordinating the
results externally. " Do you mean, searching in each index and then somehow
merge results manually? will I still be able to use shards parameters? or
no?

Also, I was planning to use php library SolrClient. Do you see any downside?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-help-with-search-in-multiple-indexes-tp4070040p4070049.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sorting by field is slow

2013-06-12 Thread Jack Krupansky
Rerun the sorted query with &debugQuery=true and look at the module timings. 
See what stands out


Are you actually sorting on a "text" field, as opposed to a "string" field?

Of course, it's always possible that maybe you're hitting some odd OOM/GC 
condition as a result of Solr growing  between releases.


-- Jack Krupansky

-Original Message- 
From: Shane Perry

Sent: Wednesday, June 12, 2013 3:00 PM
To: solr-user@lucene.apache.org
Subject: Sorting by field is slow

In upgrading from Solr 3.6.1 to 4.3.0, our query response time has
increased exponentially.  After testing in 4.3.0 it appears the same query
(with 1 matching document) returns after 100 ms without sorting but takes 1
minute when sorting by a text field.  I've looked around but haven't yet
found a reason for the degradation.  Can someone give me some insight or
point me in the right direction for resolving this?  In most cases, I can
change my code to do client-side sorting but I do have a couple of
situations where pagination prevents client-side sorting.  Any help would
be greatly appreciated.

Thanks,

Shane 



Re: Need help with search in multiple indexes

2013-06-12 Thread Michael Della Bitta
> I am not quite sure what you mean by "you're probably stuck coordinating
> the
> results externally. " Do you mean, searching in each index and then somehow
> merge results manually? will I still be able to use shards parameters? or
> no?
>

If your schemas don't match up, you can't use distributed search, so yes,
manual merging. You can't use the shards parameter across indexes with
incompatible schema.

I'd strongly consider just including all the fields in a single schema and
leaving them blank if they don't apply to a given type of data.



> Also, I was planning to use php library SolrClient. Do you see any
> downside?
>

No, this works fine!


RE: Solr Shards and ZooKeeper

2013-06-12 Thread Kalyan Kuram
It worked ,i followed steps only difference i erased everything and started 
from scratch again

> From: kalyan.ku...@live.com
> To: solr-user@lucene.apache.org
> Subject: Solr Shards and ZooKeeper
> Date: Wed, 12 Jun 2013 14:51:41 -0400
> 
> Hi allI am trying to configure external zookeeper with solr instances which 
> has to have 2 shards.I tried the introductory solrcloud wiki page and 
> lucidworks solrcloud page it works just fine(embedded zookeeper),The problem 
> i have is start solr with 2 shards when i have external zookeeper,i cant get 
> solr to start with 2 shards 
> Steps followed1.start zookeeper
> 2.Start First solr instance with args nohup java 
> -Dbootstrap_confdir=./solr/Articles/conf -Dcollection.configName=Articles 
> -DzkHost=dev-core-solr1:2181 -DnumShards=2 -jar start.jar &
> 3.Start the second solr instancenohup  java -DzkHost=dev-core-solr1:2181 -jar 
> start.jar   & 
>   
>
> And when i navigate to the cloud page i see shard1 connected to 2 solr 
> instances instead of shard1 connected to solr1instance1 and shard2 to 
> solrinstance2
> Behaviour is not the same when i start embedded zookeeper from solr,i see 
> shard1 connected to solrinstance1 and shard2 connected to solrinstance2
> Am i doing something wrong or have i missed any steps.Please help
> Kalyan  
  

Re: Need help with search in multiple indexes

2013-06-12 Thread smanad
Is this a limitation of solr/lucene, should I be considering using other
option like using Elasticsearch (which is also based on lucene)? 
But I am sure search in multiple indexes is kind of a common problem.

Also, i as reading this post
http://stackoverflow.com/questions/2139030/search-multiple-solr-cores-and-return-one-result-set
in one of the comments it says, 
"So if I have Core0 with fields documentId,fieldA,fieldB and Core1 with
fields documentId,fieldC,fieldD. Then I create another core, lets say Core3
with fields documentId,fieldA,fieldB,fieldC,fieldD. I will never be
importing data into this core? And then create a query handler, that
includes the shard parameter. So when I query Core3, it will never really
contain indexed data, but because of the shard searching it will fetch the
results from the other to cores, and "present" it on the 3rd core? Thanks
for the help! "

Is that what I should be doing? So all the indexing still happens in
separate cores but searching happens in a one single core?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-help-with-search-in-multiple-indexes-tp4070040p4070055.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sorting by field is slow

2013-06-12 Thread Shane Perry
Thanks for the responses.

Setting first/newSearcher had no noticeable effect.  I'm sorting on a
stored/indexed field named 'text' who's fieldType is solr.TextField.
 Overall, the values of the field are unique. The JVM is only using about
2G of the available 12G, so no OOM/GC issue (at least on the surface).  The
server is question is a slave with approximately 56 million documents.
 Additionally, sorting on a field of the same type but with significantly
less uniqueness results quick response times.

The following is a sample of *debugQuery=true* for a query which returns 1
document:


  61458.0
  
61452.0
  
  
0.0
  
  
0.0
  
  
0.0
  
  
0.0
  
  
6.0
  



-- Update --

Out of desperation, I turned off replication by commenting out the ** element in the replication requestHandler block.  After
restarting tomcat I was surprised to find that the replication admin UI
still reported the core as replicating.  Search queries were still slow.  I
then disabled replication via the UI and the display updated to report the
core was no longer replicating.  Queries are now fast so it appears that
the sorting may be a red-herring.

It's may be of note to also mention that the slow queries don't appear to
be getting cached.

Thanks again for the feed back.

On Wed, Jun 12, 2013 at 2:33 PM, Jack Krupansky wrote:

> Rerun the sorted query with &debugQuery=true and look at the module
> timings. See what stands out
>
> Are you actually sorting on a "text" field, as opposed to a "string" field?
>
> Of course, it's always possible that maybe you're hitting some odd OOM/GC
> condition as a result of Solr growing  between releases.
>
> -- Jack Krupansky
>
> -Original Message- From: Shane Perry
> Sent: Wednesday, June 12, 2013 3:00 PM
> To: solr-user@lucene.apache.org
> Subject: Sorting by field is slow
>
>
> In upgrading from Solr 3.6.1 to 4.3.0, our query response time has
> increased exponentially.  After testing in 4.3.0 it appears the same query
> (with 1 matching document) returns after 100 ms without sorting but takes 1
> minute when sorting by a text field.  I've looked around but haven't yet
> found a reason for the degradation.  Can someone give me some insight or
> point me in the right direction for resolving this?  In most cases, I can
> change my code to do client-side sorting but I do have a couple of
> situations where pagination prevents client-side sorting.  Any help would
> be greatly appreciated.
>
> Thanks,
>
> Shane
>


Re: Need help with search in multiple indexes

2013-06-12 Thread Michael Della Bitta
I had not heard of that technique before. Interesting!

But couldn't you do the same thing with a unified schema spread among your
cores?

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Wed, Jun 12, 2013 at 5:05 PM, smanad  wrote:

> Is this a limitation of solr/lucene, should I be considering using other
> option like using Elasticsearch (which is also based on lucene)?
> But I am sure search in multiple indexes is kind of a common problem.
>
> Also, i as reading this post
>
> http://stackoverflow.com/questions/2139030/search-multiple-solr-cores-and-return-one-result-set
> in one of the comments it says,
> "So if I have Core0 with fields documentId,fieldA,fieldB and Core1 with
> fields documentId,fieldC,fieldD. Then I create another core, lets say Core3
> with fields documentId,fieldA,fieldB,fieldC,fieldD. I will never be
> importing data into this core? And then create a query handler, that
> includes the shard parameter. So when I query Core3, it will never really
> contain indexed data, but because of the shard searching it will fetch the
> results from the other to cores, and "present" it on the 3rd core? Thanks
> for the help! "
>
> Is that what I should be doing? So all the indexing still happens in
> separate cores but searching happens in a one single core?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Need-help-with-search-in-multiple-indexes-tp4070040p4070055.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


RE: java.lang.NullPointerException. I am trying to use CachedSqlEntityProcessor

2013-06-12 Thread srinalluri
I have solved this problem and able work with CachedSqlEntityProcessor
successfully after a very long struggle. 

I tried this on 4.2.

There are still existing bugs it seems:
1. What ever you mention in cacheKey, that field name must in the select
statement explicitly.
2. If I am correct, the field name in cacheKey and in the select statement
are case sensitive.
3. We have ID field in our table, I tried to give cacheKey="ID". But that
got conflicted with the uniqueKey as uniqueKey is also "ID". So I wrote it
as "SELECT ID AS AID,." and cacheKey="AID"

thanks
Srini  




--
View this message in context: 
http://lucene.472066.n3.nabble.com/java-lang-NullPointerException-I-am-trying-to-use-CachedSqlEntityProcessor-tp4059815p4070059.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Need help with search in multiple indexes

2013-06-12 Thread smanad
In my case, different teams will be updating indexes at different intervals
so having separate cores gives more control. However, I can still
update(add/edit/delete) data with conditions like check for doc type.

Its just that, using shards sounds much cleaner and readable.

However, I am not yet sure if there might be any performance issues.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-help-with-search-in-multiple-indexes-tp4070040p4070061.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Need help with search in multiple indexes

2013-06-12 Thread Jack Krupansky
Michael's point was that the schema need to be compatible. I mean, if you 
query fields A, B, C, and D, and index1 has fields A and B, while index2 has 
fields C and D, and index3 has fields E and F, what kind of results do you 
think you will get back?!


Whether the schemas must be identical is not absolutely clear, but they at 
least have to include all the fields that queries will use. And... key 
values need to be unique across indexes.


Yes, Solr CAN do it. But to imagine that it would give reasonable query 
results with no coordination between the developers of the separate indexes 
is a little too much.


The bottom line: Somebody needs to coordinate the development of the schemas 
for the separate indexes so that they will be compatible from a query term 
and key value perspective, at a minimum.


-- Jack Krupansky

-Original Message- 
From: smanad

Sent: Wednesday, June 12, 2013 5:05 PM
To: solr-user@lucene.apache.org
Subject: Re: Need help with search in multiple indexes

Is this a limitation of solr/lucene, should I be considering using other
option like using Elasticsearch (which is also based on lucene)?
But I am sure search in multiple indexes is kind of a common problem.

Also, i as reading this post
http://stackoverflow.com/questions/2139030/search-multiple-solr-cores-and-return-one-result-set
in one of the comments it says,
"So if I have Core0 with fields documentId,fieldA,fieldB and Core1 with
fields documentId,fieldC,fieldD. Then I create another core, lets say Core3
with fields documentId,fieldA,fieldB,fieldC,fieldD. I will never be
importing data into this core? And then create a query handler, that
includes the shard parameter. So when I query Core3, it will never really
contain indexed data, but because of the shard searching it will fetch the
results from the other to cores, and "present" it on the 3rd core? Thanks
for the help! "

Is that what I should be doing? So all the indexing still happens in
separate cores but searching happens in a one single core?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-help-with-search-in-multiple-indexes-tp4070040p4070055.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Solr.xml dataDir attribute persistence

2013-06-12 Thread aus...@3bx.org
Hello,



I’m attempting to figure out what’s required for my Solr implementation to
dynamically create new cores based on a template set of config files.



My plan is to use this “template” directory as the instance directory for
multiple cores, while maintaining a separate data directory for each core.



I’ve found that when I issue a CREATE command via the CoreAdmin handler,
everything seems to work fine; however, once Solr is restarted, those cores
that were created start using the default dataDir (i.e.
/data).



It appears that the dataDir attribute is being applied when the core is
initially created, but it isn’t actually being persisted in the solr.xml
file along with the other core information.



Am I missing something on how I should be using this particular property?



Thanks in advance!


Re: SOLR-4641: Schema now throws exception on illegal field parameters.

2013-06-12 Thread Erick Erickson
bbarani:

Where did you see this? I haven't seen it before and I get an error on
startup if I add validate="false" to a  definition

Thanks,
Erick

On Tue, Jun 11, 2013 at 12:33 PM, bbarani  wrote:
> I think if you use validate=false in schema.xml, field or dynamicField level,
> Solr will not disable validation.
>
> I think this only works in solr 4.3 and above..
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SOLR-4641-Schema-now-throws-exception-on-illegal-field-parameters-tp4069622p4069688.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR-4641: Schema now throws exception on illegal field parameters.

2013-06-12 Thread Erick Erickson
But see Steve Rowe's comments at
https://issues.apache.org/jira/browse/SOLR-4641 and use custom child
properties as:


  VALUE  
   
  ...


Best
Erick

On Wed, Jun 12, 2013 at 6:49 PM, Erick Erickson  wrote:
> bbarani:
>
> Where did you see this? I haven't seen it before and I get an error on
> startup if I add validate="false" to a  definition
>
> Thanks,
> Erick
>
> On Tue, Jun 11, 2013 at 12:33 PM, bbarani  wrote:
>> I think if you use validate=false in schema.xml, field or dynamicField level,
>> Solr will not disable validation.
>>
>> I think this only works in solr 4.3 and above..
>>
>>
>>
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/SOLR-4641-Schema-now-throws-exception-on-illegal-field-parameters-tp4069622p4069688.html
>> Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to ignore folder collection1 when running single instance of SOLR?

2013-06-12 Thread Erick Erickson
Discovery should be out with 4.4

On Tue, Jun 11, 2013 at 4:19 PM, Upayavira  wrote:
> What you are doing by removing solr.xml is reverting to the old Solr 3.x
> 'single core' setup. Erick is suggesting that this is best considered
> deprecated, and will make life harder for you with future releases.
>
> If you don't like the reference to 'collection1', rename it to something
> else. Stop Solr, rename collection1 to another name, change the
> references to it in solr.xml, then restart Solr.
>
> Upayavira
>
> On Tue, Jun 11, 2013, at 06:41 PM, bbarani wrote:
>> Erick,
>>
>> Thanks a lot for your response.
>>
>> Just to confirm if I am right, I need to use solr.xml even if I change
>> the
>> folder structure as below. Am I right?
>>
>> Do you have any idea when "discovery-based" core enumeration feature
>> would
>> be released?
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Ignore-folder-collection1-when-running-single-instance-of-SOLR-without-using-solr-xml-tp4069416p4069715.html
>> Sent from the Solr - User mailing list archive at Nabble.com.


Re: upgrading 1hr autoCommit behavior

2013-06-12 Thread Erick Erickson
Just to pile on, transaction logs do use up some memory, but it does
NOT store the whole document in memory, docs are flushed to the tlog
on disk. What is kept in memory is some basic doc info (unique id?)
and a pointer to that doc in memory, so not much really unless you're
keeping a boatload of docs.

But hard commits with openSearcher=false won't do much of anything.
They don't open new searcher, invalidate caches etc. What that does do
is close the current segment and start a new one. But that segment
isn't used until the next time a searcher is opened.

FWIW,
Erick

On Tue, Jun 11, 2013 at 8:39 PM, Chris Hostetter
 wrote:
>
> : > However, we are wondering how to best setup autoCommit/autoSoftCommit on
> : > masters to preserve the old behavior. It seems that setting autoCommit to
> : > 1hr (openSearcher=true) without any autoSoftCommit preserves our previous
> : > setup - is this correct? Wil the transaction log make masters use much 
> more
> : > heap due to 1hr periods between commits? This can be a problem for us
> : > because we put many master cores on one solr JVM
> :
> : If you want to completely preserve your previous setup, then you've got it
> : correct.  Depending on how much you index over the course of that hour, you
> : might want to go a different way.
>
> if you want to *exactly* recreate teh behavior you had before, use the
> same autoCommit settings you had before, and don't add any updateLog
> config.
>
> however: if the only reason you were using 1 hour autocommits was the
> minimize the searcher re-opening on slaves to improve query performance
> via caching, then i would suggest that you switch to using autocommit more
> frequently, and instead change your slave polling interval to only be once
> an hour -- that recomendation is independent of which version of solr you
> use, it will just previde you better durability of updates regardless of
> wether you use the update log.
>
> If you want to take advantage of updateLog features (ie: doing atomic
> updates or optimistic concurrency updates against your master) or want
> improved durability w/o needing to block on every doc update to wait for a
> hard commit, then enable updateLog, set a reasonable autocommit -- and
> still continue to use that long polling interval on your slaves.
>
> concerns about tradeoffs between updateLog size, autoCommit,
> autoSoftCOmmit vs openSearcher=true & caches are really only a big deal in
> a SolrCloud type setup -- in classic replication the snappull frequency
> acts as a mitigator between the former and the later.
>
>
>
> -Hoss


Re: document indexing

2013-06-12 Thread Erick Erickson
Questions:
 What does your Solr admin page say?
Did you commit after you indexed the doc?
What is your evidence that a search fails?

You might review:
http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

On Wed, Jun 12, 2013 at 5:16 AM, sodoo  wrote:
> Hi all,
>
> I am beginner and i try to index pdf, docx, txt files.
> How I can I index these format files?
>
> I have installed solr server in /opt/solr
> Also I have created "documents" directory. Then I copied index files in
> /opt/solr/documents.
>
> I tried to index below command. Originally almost indexed. I looked the log
> file. Doc index log has written. But unfortunately searched text not found.
>
> curl
> "http://localhost:8983/solr/update/extract?stream.file=/opt/solr/document/Web_Hosting_Instruction.pdf&literal.id=doc1";
>
> Please advice & assist me.
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/document-indexing-tp3241832p4069871.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sorting by field is slow

2013-06-12 Thread Erick Erickson
This doesn't make much sense, particularly the fact
that you added first/new searchers. I'm assuming that
these are sorting on the same field as your slow query.

But sorting on a text field for which
"Overall, the values of the field are unique"
is a red-flag. Solr doesn't sort on fields that have
more than one term, so you might as well use a
string field and be done with it, it's possible you're
hitting some edge case.

Did you just copy your 3.6 schema and configs to
4.3? Did you re-index?

Best
Erick

On Wed, Jun 12, 2013 at 5:11 PM, Shane Perry  wrote:
> Thanks for the responses.
>
> Setting first/newSearcher had no noticeable effect.  I'm sorting on a
> stored/indexed field named 'text' who's fieldType is solr.TextField.
>  Overall, the values of the field are unique. The JVM is only using about
> 2G of the available 12G, so no OOM/GC issue (at least on the surface).  The
> server is question is a slave with approximately 56 million documents.
>  Additionally, sorting on a field of the same type but with significantly
> less uniqueness results quick response times.
>
> The following is a sample of *debugQuery=true* for a query which returns 1
> document:
>
> 
>   61458.0
>   
> 61452.0
>   
>   
> 0.0
>   
>   
> 0.0
>   
>   
> 0.0
>   
>   
> 0.0
>   
>   
> 6.0
>   
> 
>
>
> -- Update --
>
> Out of desperation, I turned off replication by commenting out the * name="slave">* element in the replication requestHandler block.  After
> restarting tomcat I was surprised to find that the replication admin UI
> still reported the core as replicating.  Search queries were still slow.  I
> then disabled replication via the UI and the display updated to report the
> core was no longer replicating.  Queries are now fast so it appears that
> the sorting may be a red-herring.
>
> It's may be of note to also mention that the slow queries don't appear to
> be getting cached.
>
> Thanks again for the feed back.
>
> On Wed, Jun 12, 2013 at 2:33 PM, Jack Krupansky 
> wrote:
>
>> Rerun the sorted query with &debugQuery=true and look at the module
>> timings. See what stands out
>>
>> Are you actually sorting on a "text" field, as opposed to a "string" field?
>>
>> Of course, it's always possible that maybe you're hitting some odd OOM/GC
>> condition as a result of Solr growing  between releases.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Shane Perry
>> Sent: Wednesday, June 12, 2013 3:00 PM
>> To: solr-user@lucene.apache.org
>> Subject: Sorting by field is slow
>>
>>
>> In upgrading from Solr 3.6.1 to 4.3.0, our query response time has
>> increased exponentially.  After testing in 4.3.0 it appears the same query
>> (with 1 matching document) returns after 100 ms without sorting but takes 1
>> minute when sorting by a text field.  I've looked around but haven't yet
>> found a reason for the degradation.  Can someone give me some insight or
>> point me in the right direction for resolving this?  In most cases, I can
>> change my code to do client-side sorting but I do have a couple of
>> situations where pagination prevents client-side sorting.  Any help would
>> be greatly appreciated.
>>
>> Thanks,
>>
>> Shane
>>


Re: Solr.xml dataDir attribute persistence

2013-06-12 Thread Erick Erickson
This is a know bug, see:
https://issues.apache.org/jira/browse/SOLR-4862

Solr.xml persistence has several shortcomings. As
it happens I'm working on that right now, but the
results won't be ready until 4.4. I hope to get a patch
up over the weekend (SOLR-4910) and this is one
of the things I want to explicitly insure is working.
SOLR-4910 should fix SOLR-4862 as well.

In the mean time, one work-around would be to
copy the template to your core for the new
directory and create it there. Copies extra
data, but...

FWIW, this is all about to change, the combination
of "core discovery" and "named config sets" should
be just what you need, see SOLR-4478


Best
Erick

On Wed, Jun 12, 2013 at 5:43 PM, aus...@3bx.org  wrote:
> Hello,
>
>
>
> I’m attempting to figure out what’s required for my Solr implementation to
> dynamically create new cores based on a template set of config files.
>
>
>
> My plan is to use this “template” directory as the instance directory for
> multiple cores, while maintaining a separate data directory for each core.
>
>
>
> I’ve found that when I issue a CREATE command via the CoreAdmin handler,
> everything seems to work fine; however, once Solr is restarted, those cores
> that were created start using the default dataDir (i.e.
> /data).
>
>
>
> It appears that the dataDir attribute is being applied when the core is
> initially created, but it isn’t actually being persisted in the solr.xml
> file along with the other core information.
>
>
>
> Am I missing something on how I should be using this particular property?
>
>
>
> Thanks in advance!


Re: SOLR-4872 and LUCENE-2145 (or, how to clean up a Tokenizer)

2013-06-12 Thread Lance Norskog
In 4.x and trunk is a close() method on Tokenizers and Filters. In 
currently released up to 4.3, there is instead a reset(stream) method 
which is how it resets a Tokenizer&Filter for a following document in 
the same upload.


In both cases I had to track the first time the tokens are consumed, and 
do all of the setup then. If you do this, then reset(stream) can clear 
the native resources, and let you re-load them on the next consume.


Look at LUCENE-2899 in OpenNLPTokenizer and OpenNLPFilter.java to see 
what I had to do.


But yes, to be absolutely sure, you need to add a finalizer.

On 06/12/2013 04:34 AM, Benson Margulies wrote:

Could I have some help on the combination of these two? Right now, it
appears that I'm stuck with a finalizer to chase after native
resources in a Tokenizer. Am I missing something?




Re: Sorting by field is slow

2013-06-12 Thread Shane Perry
Erick,

I agree, it doesn't make sense.  I manually merged the solrconfig.xml from
the distribution example with my 3.6 solrconfig.xml, pulling out what I
didn't need.  There is the possibility I removed something I shouldn't have
though I don't know what it would be.  Minus removing the dynamic fields, a
custom tokenizer class, and changing all my fields to be stored, the
schema.xml file should be the same as well.  I'm not currently in the
position to do so, but I'll double check those two files.  Finally, the
data was re-indexed when I moved to 4.3.

My statement about field values wasn't stated very well.  What I meant is
that the 'text' field has more unique terms than some of my other fields.

As for this being an edge case, I'm not sure why it would manifest itself
in 4.3 but not in 3.6 (short of me having a screwy configuration setting).
 If I get a chance, I'll see if I can duplicate the behavior with a small
document count in a sandboxed environment.

Shane

On Wed, Jun 12, 2013 at 5:14 PM, Erick Erickson wrote:

> This doesn't make much sense, particularly the fact
> that you added first/new searchers. I'm assuming that
> these are sorting on the same field as your slow query.
>
> But sorting on a text field for which
> "Overall, the values of the field are unique"
> is a red-flag. Solr doesn't sort on fields that have
> more than one term, so you might as well use a
> string field and be done with it, it's possible you're
> hitting some edge case.
>
> Did you just copy your 3.6 schema and configs to
> 4.3? Did you re-index?
>
> Best
> Erick
>
> On Wed, Jun 12, 2013 at 5:11 PM, Shane Perry  wrote:
> > Thanks for the responses.
> >
> > Setting first/newSearcher had no noticeable effect.  I'm sorting on a
> > stored/indexed field named 'text' who's fieldType is solr.TextField.
> >  Overall, the values of the field are unique. The JVM is only using about
> > 2G of the available 12G, so no OOM/GC issue (at least on the surface).
>  The
> > server is question is a slave with approximately 56 million documents.
> >  Additionally, sorting on a field of the same type but with significantly
> > less uniqueness results quick response times.
> >
> > The following is a sample of *debugQuery=true* for a query which returns
> 1
> > document:
> >
> > 
> >   61458.0
> >   
> > 61452.0
> >   
> >   
> > 0.0
> >   
> >   
> > 0.0
> >   
> >   
> > 0.0
> >   
> >   
> > 0.0
> >   
> >   
> > 6.0
> >   
> > 
> >
> >
> > -- Update --
> >
> > Out of desperation, I turned off replication by commenting out the * > name="slave">* element in the replication requestHandler block.  After
> > restarting tomcat I was surprised to find that the replication admin UI
> > still reported the core as replicating.  Search queries were still slow.
>  I
> > then disabled replication via the UI and the display updated to report
> the
> > core was no longer replicating.  Queries are now fast so it appears that
> > the sorting may be a red-herring.
> >
> > It's may be of note to also mention that the slow queries don't appear to
> > be getting cached.
> >
> > Thanks again for the feed back.
> >
> > On Wed, Jun 12, 2013 at 2:33 PM, Jack Krupansky  >wrote:
> >
> >> Rerun the sorted query with &debugQuery=true and look at the module
> >> timings. See what stands out
> >>
> >> Are you actually sorting on a "text" field, as opposed to a "string"
> field?
> >>
> >> Of course, it's always possible that maybe you're hitting some odd
> OOM/GC
> >> condition as a result of Solr growing  between releases.
> >>
> >> -- Jack Krupansky
> >>
> >> -Original Message- From: Shane Perry
> >> Sent: Wednesday, June 12, 2013 3:00 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Sorting by field is slow
> >>
> >>
> >> In upgrading from Solr 3.6.1 to 4.3.0, our query response time has
> >> increased exponentially.  After testing in 4.3.0 it appears the same
> query
> >> (with 1 matching document) returns after 100 ms without sorting but
> takes 1
> >> minute when sorting by a text field.  I've looked around but haven't yet
> >> found a reason for the degradation.  Can someone give me some insight or
> >> point me in the right direction for resolving this?  In most cases, I
> can
> >> change my code to do client-side sorting but I do have a couple of
> >> situations where pagination prevents client-side sorting.  Any help
> would
> >> be greatly appreciated.
> >>
> >> Thanks,
> >>
> >> Shane
> >>
>


Re: document indexing

2013-06-12 Thread sodoo
Thank you for quick reply. I have solve the problem.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/document-indexing-tp3241832p4070116.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: reg: efficient querying using solr

2013-06-12 Thread gururaj kosuru
Hi Chris,
thanks for sharing the document. It is very helpful to have
an estimate of what is consuming the memory.




On 12 June 2013 10:47, Chris Morley  wrote:

> This might help (indirectly):
>
> http://svn.apache.org/repos/asf/lucene/dev/trunk/dev-tools/size-estimator-lu
> cene-solr.xls
>
> 
>  From: "gururaj kosuru" 
> Sent: Wednesday, June 12, 2013 12:28 AM
> To: "solr-user" 
> Subject: Re: reg: efficient querying using solr
>
> Thanks Walter, Shawn and Otis for the assistance, I will look into tuning
> the parameters by experimenting as seems to be the only way to go.
>
> On 11 June 2013 19:17, Shawn Heisey  wrote:
>
> > On 6/11/2013 12:15 AM, gururaj kosuru wrote:
> > > How can one calculate an ideal max shard size for a solr core instance
> > if I
> > > am running a cloud with multiple systems of 4GB?
> >
> > That question is impossible to answer without experimentation, but
> > here's a good starting point.  That's all it is, a starting point:
> >
> > http://wiki.apache.org/solr/SolrPerformanceProblems
> >
> > Thanks,
> > Shawn
> >
> >
>
>