Re: What exactly happens to extant documents when the schema changes?

2013-05-29 Thread Dotan Cohen
On Tue, May 28, 2013 at 3:58 PM, Jack Krupansky  wrote:
> The technical answer: Undefined and not guaranteed.
>

I was afraid of that!

> Sure, you can experiment and see what the effects "happen" to be in any
> given release, and maybe they don't tend to change (too much) between most
> releases, but there is no guarantee that any given "change schema but keep
> existing data without a delete of directory contents and full reindex" will
> actually be benign or what you expect.
>
> As a general proposition, when it comes to changing the schema and not
> deleting the directory and doing a full reindex, don't do it! Of course, we
> all know not to try to walk on thin ice, but a lot of people will try to do
> it anyway - and maybe it happens that most of the time the results are
> benign.
>

In the case of this particular application, reindexing really is
overly burdensome as the application is performing hundreds of writes
to the index per minute. How might I gauge how much spare I/O Solr
could commit to a reindex? All the data that I need is in fact in
stored fields.

Note that because the social media application that feeds our Solr
index is global, there are no 'off hours'.


> OTOH, you could file a Jira to propose that the effects of changing the
> schema but keeping the existing data should be precisely defined and
> documented, but, that could still change from release to release.
>

Seems like a lot of effort to document, for little benefit. I'm not
going to file it. I would like to know, though, is the schema
consulted at index time, query time, or both?


> From a practical perspective for your original question: If you suddenly add
> a field, there is no guarantee what will happen when you try to access that
> field for existing documents, or what will happen if you "update" existing
> documents. Sure, people can talk about what "happens to be true today", but
> there is no guarantee for the future. Similarly for deleting a field from
> the schema, there is no guarantee about the status of existing data, even
> though people can chatter about "what it seems to do today."
>
> Generally, you should design your application around contracts and what is
> guaranteed to be true, not what happens to be true from experiments or even
> experience. Granted, that is the theory and sometimes you do need to rely on
> experimentation and folklore and spotty or ambiguous documentation, but to
> the extent possible, it is best to avoid explicitly trying to rely on
> undocumented, uncontracted behavior.
>

Thanks. The application does change (added features) and we do not
want to loose old data.


> One question I asked long ago and never received an answer: what is the best
> practice for doing a full reindex - is it sufficient to first do a delete of
> "*:*", or does the Solr index directory contents or even the directory
> itself need to be explicitly deleted first? I believe it is the latter, but
> the former "seems" to work, most of the time. Deleting the directory itself
> "seems" to be the best answer, to date - but no guarantees!
>

I don't have an answer for that, sorry!

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Choosing specific fields for suggestions in SpellCheckerComponent

2013-05-29 Thread Shalin Shekhar Mangar
Hi Wilson,

I don't think SpellCheckComponent supports multiple fields in the same
dictionary. Am I missing something?


On Wed, May 29, 2013 at 10:24 AM, Wilson Passos  wrote:

> Hi everyone,
>
>
> I've been searching about how to configure the SpellCheckerComponent in
> Solr 4.0 to support suggestion queries based on s subset of the configured
> fields in schema.xml. Let's say the spell checking is configured to use
> these 4 fields:
>
> 
> 
> 
> 
>
> I'd like to know if there's any possibility to dynamically set the
> SpellCheckerComponent to suggest terms using just fields "field2" and
> "field3" instead of the default behavior, which always includes suggestions
> across the 4 defined fields.
>
> Thanks in advance for any help!
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr 4.3: node is seen as active in Zk while in recovery mode + endless recovery

2013-05-29 Thread Shalin Shekhar Mangar
I have opened https://issues.apache.org/jira/browse/SOLR-4870


On Tue, May 28, 2013 at 5:53 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> This sounds like a bug. I'll open an issue. Thanks!
>
>
> On Tue, May 28, 2013 at 2:29 PM, AlexeyK  wrote:
>
>> The cluster state problem reported above is not an issue - it was caused
>> by
>> our own code.
>> Speaking about the update log - i have noticed a strange behavior
>> concerning
>> the replay. The replay is *supposed* to be done for a predefined number of
>> log entries, but actually it is always done for the whole last 2 tlogs.
>> RecentUpdates.update() reads log within  while (numUpdates <
>> numRecordsToKeep), while numUpdates is never incremented, so it exits when
>> the reader reaches EOF.
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Solr-4-3-node-is-seen-as-active-in-Zk-while-in-recovery-mode-endless-recovery-tp4065549p4066452.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr 4.3: node is seen as active in Zk while in recovery mode + endless recovery

2013-05-29 Thread Shalin Shekhar Mangar
On Thu, May 23, 2013 at 7:00 PM, AlexeyK  wrote:




> from what I understood from the code, for each 'add' command there is a
> test
> for a 'delete by query'. if there is an older dbq, it's run after the 'add'
> operation if its version > 'add' version.
> in my case, there are a lot of documents to be inserted, and a single large
> DBQ. My question is: shouldn't this be done in bulks? Why is it necessary
> to
> run the DBQ after each insertion? Supposedly there are 1000 insertions it's
> run 1000 times.
>
>
>
As I understand it, this is done to handle out-of-order updates. Suppose a
client makes a few add requests and then invokes a DBQ but the DBQ reaches
the replicas before the last add request. In such a case, the DBQ is
executed after the add request to preserve consistency. We don't do that in
bulk because we don't know how long to wait for all add requests to arrive.
Also, the individual add requests may arrive via different threads (think
connection reset from leader to replica).

That being said, the scenario you describe of a 1000 insertions causing
DBQs to be run a large number of times (on recovery after restarting) could
be optimized. Note that the bug you discovered (SOLR-4870) does not affect
log replay because log replay on startup will replay all of the last two
transaction logs (unless they end with a commit). Only PeerSync is affected
by SOLR-4870.

You say that both nodes are leaders but the comment inside
DirectUpdateHandler2.addDoc() says that deletesAfter (i.e. reordered DBQs)
should always be null on leaders. So there's definitely something fishy
here. A quick review of the code leads me to believe that reordered DBQs
can happen on a leader as well. I'll investigate further.


> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-4-3-node-is-seen-as-active-in-Zk-while-in-recovery-mode-endless-recovery-tp4065549p4065628.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Sorting results by last update date

2013-05-29 Thread Shalin Shekhar Mangar
On Wed, May 29, 2013 at 12:10 PM, Kamal Palei  wrote:

> Hi All
> I am trying to sort the results as per last updated date. My url looks as
> below.
>
> *&fq=last_updated_date:[NOW-60DAY TO NOW]&fq=experience:[0 TO
> 588]&fq=salary:[0 TO 500] OR
>
> salary:0&fq=-bundle:job&fq=-bundle:panel&fq=-bundle:page&fq=-bundle:article&spellcheck=true&q=+java
>
> +sip&fl=id,entity_id,entity_type,bundle,bundle_name,label,is_comment_count,ds_created,ds_changed,score,path,url,is_uid,tos_name,zm_parent_entity,ss_filemime,ss_file_entity_title,ss_file_entity_url,ss_field_uid&spellcheck.q=+java
>
> +sip&qf=content^40&qf=label^5.0&qf=tos_content_extra^0.1&qf=tos_name^3.0&hl.fl=content&mm=1&q.op=AND&wt=json&
> json.nl=map&sort=last_updated_date asc
> *
> With this I get the data in ascending order of last updated date.
>
> If I am trying to sort data in descending order, I use below url
>
> *&fq=last_updated_date:[NOW-60DAY TO NOW]&fq=experience:[0 TO
> 588]&fq=salary:[0 TO 500] OR
>
> salary:0&fq=-bundle:job&fq=-bundle:panel&fq=-bundle:page&fq=-bundle:article&spellcheck=true&q=+java
>
> +sip&fl=id,entity_id,entity_type,bundle,bundle_name,label,is_comment_count,ds_created,ds_changed,score,path,url,is_uid,tos_name,zm_parent_entity,ss_filemime,ss_file_entity_title,ss_file_entity_url,ss_field_uid&spellcheck.q=+java
>
> +sip&qf=content^40&qf=label^5.0&qf=tos_content_extra^0.1&qf=tos_name^3.0&hl.fl=content&mm=1&q.op=AND&wt=json&
> json.nl=map&sort=last_updated_date desc*
>
> Here the data set is not ordered properly, mostly it looks to me data is
> ordered on basis of score, not last updated date.
>
> Can somebody tell me what I am missing here, why *desc* is not working
> properly for me.
>
>
What is the field type of last_update_date? Which version of Solr?

A side note: Using NOW in a filter query is ineffecient because it doesn't
use your filter cache effectively. Round it to nearest time interval
instead. See http://java.dzone.com/articles/solr-date-math-now-and-filter

-- 
Regards,
Shalin Shekhar Mangar.


Re: delta-import tweaking?

2013-05-29 Thread Kristian Rink
Hi Shawn;

and first off, thanks bunches for your pointers.

Am Tue, 28 May 2013 09:31:54 -0600
schrieb Shawn Heisey :
> My workaround was to store the highest indexed autoincrement value in
> a location outside Solr.  In my original Perl code, I dropped it into
> a file on NFS.  The latest iteration of my indexing code (Java, using 
> SolrJ) no longer uses DIH for regular indexing, but it still uses
> that stored autoincrement value, this time in another database
> table.  I do still use full-import for complete index rebuilds.

Well, overally after playing with it a bit last nite, I decided to also
go down the SolrJ way; we'll be likely to use this in the future anyway
as the rest of our environment's Java too, so going for it right now
seems just the logical thing to do.

Thanks and all the best! 
Kristian 


Reindexing strategy

2013-05-29 Thread Dotan Cohen
I see that I do need to reindex my Solr index. The index consists of
20 million documents with a few hundred new documents added per minute
(social media data). The documents are mostly smaller than 1KiB of
data, but some may go as large as 10 KiB. All the data is text, and
all indexed fields are stored.

To reindex, I am considering adding a 'last_indexed' field, and having
a Python or Java application pull out N results every T seconds when
sorting on "last_indexed asc". How might I determine a good values for
N and T? I would like to know when the Solr index is 'overloaded', or
whatever happens to Solr when it is being pushed beyond the limits of
its hardware. What should I be looking at to know if Solr is over
stressed? Is looking at CPU and memory good enough? Is there a way to
measure I/O to the disk on which the Solr index is stored? Bear in
mind that while the reindex is happening, clients will be performing
searches and a few hundred documents will be written per minute. Note
that the machine running Solr is an EC2 instance running on Amazon Web
Services, and that the 'disk' on which the Solr index is stored in an
EBS volume.

Thank you.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Strange behavior on text field with number-text content

2013-05-29 Thread Erick Erickson
Hmmm, there are two things you _must_ get familiar with when diagnosing
these ..

1> admin/analysis. That'll show you exactly what the analysis chain does,
and it's
 not always obvious.
2> add &debug=query to your input and look at the parsed query results. For
instance,
 this "name:4nSolution Inc." parses as name:4nSolution defaultfield:inc.

That doesn't explain why name=4nSolutions, except..

your index chain has splitOnCaseChange=1 and your query bit has
splitOnCaseChange=0
which doesn't seem right

Best
Erick


On Tue, May 28, 2013 at 10:31 AM, Алексей Цой  wrote:

> solr-user-unsubscribe 
>
>
> 2013/5/28 Michał Matulka 
>
>>  Thanks for your responses, I must admit that after hours of trying I
>> made some mistakes.
>> So the most problematic phrase will now be:
>> "4nSolution Inc." which cannot be found using query:
>>
>> name:4nSolution
>>
>> or even
>>
>> name:4nSolution Inc.
>>
>> but can be using following queries:
>>
>> name:nSolution
>> name:4
>> name:inc
>>
>> Sorry for the mess, it turned out I didn't reindex fields after modyfying
>> schema so I thought that the problem also applies to 300letters .
>>
>> The cause of all of this is the WordDelimiter filter defined as following:
>>
>> 
>>   
>> 
>> 
>> 
>> > ignoreCase="true"
>> words="stopwords.txt"
>> enablePositionIncrements="true"
>> />
>> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
>> preserveOriginal="1"/>
>> 
>> > language="English" protected="protwords.txt"/>
>>   
>>   
>> 
>> > ignoreCase="true" expand="true"/>
>> > ignoreCase="true"
>> words="stopwords.txt"
>> enablePositionIncrements="true"
>> />
>> > generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="1" splitOnCaseChange="0"
>> preserveOriginal="1" />
>> 
>> > language="English" protected="protwords.txt"/>
>>   
>> 
>>
>> and I still don't know why it behaves like that - after all there is
>> "preserveOriginal" attribute set to 1...
>>
>> W dniu 28.05.2013 14:21, Erick Erickson pisze:
>>
>> Hmmm, with 4.x I get much different behavior than you're
>> describing, what version of Solr are you using?
>>
>> Besides Alex's comments, try adding &debug=query to the url and see what 
>> comes
>> out from the query parser.
>>
>> A quick glance at the code shows that DefaultAnalyzer is used, which doesn't 
>> do
>> any analysis, here's the javadoc...
>>  /**
>>* Default analyzer for types that only produces 1 verbatim token...
>>* A maximum size of chars to be read must be specified
>>*/
>>
>> so it's much like the "string" type. Which means I'm totally perplexed by 
>> your
>> statement that 300 and letters return a hit. Have you perhaps changed the
>> field definition and not re-indexed?
>>
>> The behavior you're seeing really looks like somehow 
>> WordDelimiterFilterFactory
>> is getting into your analysis chain with settings that don't mash the parts 
>> back
>> together, i.e. you can set up WDDF to split on letter/number transitions, 
>> index
>> each and NOT index the original, but I have no explanation for how that
>> could happen with the field definition you indicated
>>
>> FWIW,
>> Erick
>>
>> On Tue, May 28, 2013 at 7:47 AM, Alexandre Rafalovitch 
>>  wrote:
>>
>>   What does analyzer screen say in the Web AdminUI when you try to do that?
>> Also, what are the tokens stored in the field (also in Web AdminUI).
>>
>> I think it is very strange to have TextField without a tokenizer chain.
>> Maybe you get a standard one assigned by default, but I don't know what the
>> standard chain would be.
>>
>> Regards,
>>
>>   Alex.
>> On 28 May 2013 04:44, "Michał Matulka"  
>>  wrote:
>>
>>
>>  Hello,
>>
>> I've got following problem. I have a text type in my schema and a field
>> "name" of that type.
>> That field contains a data, there is, for example, record that has
>> "300letters" as name.
>>
>> Now field type definition:
>> 
>>
>> And, of course, field definition:
>> 
>>
>> yes, that's all - there are no tokenizers.
>>
>> And now time for my question:
>>
>> Why following queries:
>>
>> name:300
>>
>> and
>>
>> name:letters
>>
>> are returning that result, but:
>>
>> name:300letters
>>
>> is not (0 results)?
>>
>> Best regards,
>> Michał Matulka
>>
>>
>>
>>
>> --
>>  Pozdrawiam,
>> Michał Matulka
>>  Programista
>>  michal.matu...@gowork.pl
>>
>>
>>  *[image: GoWork.pl]*
>>  ul. Zielna 39
>>  00-108 Warszawa
>>  www.GoWork.pl
>>
>
>


Re: Note on The Book

2013-05-29 Thread Erick Erickson
FWIW, picking up on Alexandre's point. One of my continual
frustrations with virtually _all_
technical books is they become endless pages of details without ever
mentioning why
the hell I should care. Unfortunately, explaining use-cases for
everything would only make
the book about 10,000 pages long. Siiigh.

I guess you can take this as a vote for narrative

Erick

On Tue, May 28, 2013 at 4:53 PM, Jack Krupansky  wrote:
> We'll have a blog for the book. We hope to have a first
> raw/rough/partial/draft published as an e-book in maybe 10 days to 2 weeks.
> As soon as we get that process under control, we'll start the blog. I'll
> keep your email on file and keep you posted.
>
> -- Jack Krupansky
>
> -Original Message- From: Swati Swoboda
> Sent: Tuesday, May 28, 2013 1:36 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Note on The Book
>
>
> I'd definitely prefer the spiral bound as well. E-books are great and your
> draft version seems very reasonably priced (aka I would definitely get it).
>
> Really looking forward to this. Is there a separate mailing list / etc. for
> the book for those who would like to receive updates on the status of the
> book?
>
> Thanks
>
> Swati Swoboda
> Software Developer - Igloo Software
> +1.519.489.4120  sswob...@igloosoftware.com
>
> Bring back Cake Fridays – watch a video you’ll actually like
> http://vimeo.com/64886237
>
>
> -Original Message-
> From: Jack Krupansky [mailto:j...@basetechnology.com]
> Sent: Thursday, May 23, 2013 7:15 PM
> To: solr-user@lucene.apache.org
> Subject: Note on The Book
>
> To those of you who may have heard about the Lucene/Solr book that I and two
> others are writing on Lucene and Solr, some bad and good news. The bad news:
> The book contract with O’Reilly has been canceled. The good news: I’m going
> to proceed with self-publishing (possibly on Lulu or even Amazon) a somewhat
> reduced scope Solr-only Reference Guide (with hints of Lucene). The scope of
> the previous effort was too great, even for O’Reilly – a book larger than
> 800 pages (or even 600) that was heavy on reference and lighter on “guide”
> just wasn’t fitting in with their traditional “guide” model. In truth, Solr
> is just too complex for a simple guide that covers it all, let alone Lucene
> as well.
>
> I’ll announce more details in the coming weeks, but I expect to publish an
> e-book-only version of the book, focused on Solr reference (and plenty of
> guide as well), possibly on Lulu, plus eventually publish 4-8 individual
> print volumes for people who really want the paper. One model I may pursue
> is to offer the current, incomplete, raw, rough, draft as a $7.99 e-book,
> with the promise of updates every two weeks or a month as new and revised
> content and new releases of Solr become available. Maybe the individual
> e-book volumes would be $2 or $3. These are just preliminary ideas. Feel
> free to let me know what seems reasonable or excessive.
>
> For paper: Do people really want perfect bound, or would you prefer spiral
> bound that lies flat and folds back easily? I suppose we could offer both –
> which should be considered “premium”?
>
> I’ll announce more details next week. The immediate goal will be to get the
> “raw rough draft” available to everyone ASAP.
>
> For those of you who have been early reviewers – your effort will not have
> been in vain. I have all your comments and will address them over the next
> month or two or three.
>
> Just for some clarity, the existing Solr Wiki and even the recent
> contribution of the LucidWorks Solr Reference to Apache really are still
> great contributions to general knowledge about Solr, but the book is
> intended to go much deeper into detail, especially with loads of examples
> and a lot more narrative guide. For example, the book has a complete list of
> the analyzer filters, each with a clean one-liner description. Ditto for
> every parameter (although I would note that the LucidWorks Solr Reference
> does a decent job of that as well.) Maybe, eventually, everything in the
> book COULD (and will) be integrated into the standard Solr doc, but until
> then, a single, integrated reference really is sorely needed. And, the book
> has a lot of narrative guide and walking through examples as well. Over
> time, I’m sure both will evolve. And just to be clear, the book is not a
> simple repurposing of the Solr wiki content – EVERY description of
> everything has been written fresh, from scratch. So, for example, analyzer
> filters get both short one-liner summary descriptions as well as more
> detailed descriptions, plus formal attribute specifications and numerous
> examples, including sample input and outputs (the LucidWorks Solr Reference
> does a better job with examples as well.)
>
> The book has been written in parallel with branch_4x and that will continue.
>
> -- Jack Krupansky


Re: Keeping a rolling window of indexes around solr

2013-05-29 Thread Erick Erickson
I suspect you're worrying about something you don't need to. At 1 insert every
30 seconds, and assuming 30,000,000 records will fit on a machine (I've seen
this), you're talking 1,000,000 seconds worth of data on a single box!
Or roughly
10,000 day's worth of data. Test, of course, YMMV.

Or I'm mis-understanding what "1 log insert" means, I guess it could be a full
log file

But do the simple thing first, just let Solr do what it does by
default and periodically
do a delete by query on documents you want to roll off the end. Especially since
you say that queries happen every few days. The tricks for utilizing
"hot shards" are
probably not very useful for you with that low a query rate.

Test, of course
Best
Erick

On Tue, May 28, 2013 at 8:42 PM, Saikat Kanjilal  wrote:
> Volume of data:
> 1 log insert every 30 seconds, queries done sporadically asynchronously every 
> so often at a much lower frequency every few days
>
> Also the majority of the requests are indeed going to be within a splice of 
> time (typically hours or at most a few days)
>
> Type of queries:
> Keyword or termsearch
> Search by guid (or id as known in the solr world)
> Reserved or percolation queries to be executed when new data becomes available
> Search by dates as mentioned above
>
> Regards
>
>
> Sent from my iPhone
>
> On May 28, 2013, at 4:25 PM, Chris Hostetter  wrote:
>
>>
>> : This is kind of the approach used by elastic search , if I'm not using
>> : solrcloud will I be able to use shard aliasing, also with this approach
>> : how would replication work, is it even needed?
>>
>> you haven't said much about hte volume of data you expect to deal with,
>> nor have you really explained what types of queries you intend to do --
>> ie: you said you were intersted in a "rolling window of indexes
>> around n days of data" but you never clarified why you think a
>> rolling window of indexes would be useful to you or how exactly you would
>> use it.
>>
>> The primary advantage of sharding by date is if you know that a large
>> percentage of your queries are only going to be within a small range of
>> time, and therefore you can optimize those requests to only hit the shards
>> neccessary to satisfy that small windo of time.
>>
>> if the majority of requests are going to be across your entire "n days" of
>> data, then date based sharding doesn't really help you -- you can just use
>> arbitrary (randomized) sharding using periodic deleteByQuery commands to
>> purge anything older then N days.  Query the whole collection by default,
>> and add a filter query if/when you want to restrict your search to only a
>> narrow date range of documents.
>>
>> this is the same general approach you would use on a non-distributed /
>> non-SolrCloud setup if you just had a single collection on a single master
>> replicated to some number of slaves for horizontal scaling.
>>
>>
>> -Hoss
>>


Re: split document or not

2013-05-29 Thread Hard_Club
But in this case phrase frequence per whole document will be not taken into
accout because document is splitted by subdocuments. Or it is not true?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/split-document-or-not-tp4066170p4066734.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Note on The Book

2013-05-29 Thread Yago Riveiro
IMHO I prefer narrative, as Erick says, explain all use-cases it's impossible, 
cover the base cases is a good start.  Either way I miss a book about solr 
different to a cookbook or a guide.  

Regards.

--  
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Wednesday, May 29, 2013 at 12:19 PM, Erick Erickson wrote:

> FWIW, picking up on Alexandre's point. One of my continual
> frustrations with virtually _all_
> technical books is they become endless pages of details without ever
> mentioning why
> the hell I should care. Unfortunately, explaining use-cases for
> everything would only make
> the book about 10,000 pages long. Siiigh.
>  
> I guess you can take this as a vote for narrative
>  
> Erick
>  
> On Tue, May 28, 2013 at 4:53 PM, Jack Krupansky  (mailto:j...@basetechnology.com)> wrote:
> > We'll have a blog for the book. We hope to have a first
> > raw/rough/partial/draft published as an e-book in maybe 10 days to 2 weeks.
> > As soon as we get that process under control, we'll start the blog. I'll
> > keep your email on file and keep you posted.
> >  
> > -- Jack Krupansky
> >  
> > -Original Message- From: Swati Swoboda
> > Sent: Tuesday, May 28, 2013 1:36 PM
> > To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org)
> > Subject: RE: Note on The Book
> >  
> >  
> > I'd definitely prefer the spiral bound as well. E-books are great and your
> > draft version seems very reasonably priced (aka I would definitely get it).
> >  
> > Really looking forward to this. Is there a separate mailing list / etc. for
> > the book for those who would like to receive updates on the status of the
> > book?
> >  
> > Thanks
> >  
> > Swati Swoboda
> > Software Developer - Igloo Software
> > +1.519.489.4120 sswob...@igloosoftware.com 
> > (mailto:sswob...@igloosoftware.com)
> >  
> > Bring back Cake Fridays – watch a video you’ll actually like
> > http://vimeo.com/64886237
> >  
> >  
> > -Original Message-
> > From: Jack Krupansky [mailto:j...@basetechnology.com]
> > Sent: Thursday, May 23, 2013 7:15 PM
> > To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org)
> > Subject: Note on The Book
> >  
> > To those of you who may have heard about the Lucene/Solr book that I and two
> > others are writing on Lucene and Solr, some bad and good news. The bad news:
> > The book contract with O’Reilly has been canceled. The good news: I’m going
> > to proceed with self-publishing (possibly on Lulu or even Amazon) a somewhat
> > reduced scope Solr-only Reference Guide (with hints of Lucene). The scope of
> > the previous effort was too great, even for O’Reilly – a book larger than
> > 800 pages (or even 600) that was heavy on reference and lighter on “guide”
> > just wasn’t fitting in with their traditional “guide” model. In truth, Solr
> > is just too complex for a simple guide that covers it all, let alone Lucene
> > as well.
> >  
> > I’ll announce more details in the coming weeks, but I expect to publish an
> > e-book-only version of the book, focused on Solr reference (and plenty of
> > guide as well), possibly on Lulu, plus eventually publish 4-8 individual
> > print volumes for people who really want the paper. One model I may pursue
> > is to offer the current, incomplete, raw, rough, draft as a $7.99 e-book,
> > with the promise of updates every two weeks or a month as new and revised
> > content and new releases of Solr become available. Maybe the individual
> > e-book volumes would be $2 or $3. These are just preliminary ideas. Feel
> > free to let me know what seems reasonable or excessive.
> >  
> > For paper: Do people really want perfect bound, or would you prefer spiral
> > bound that lies flat and folds back easily? I suppose we could offer both –
> > which should be considered “premium”?
> >  
> > I’ll announce more details next week. The immediate goal will be to get the
> > “raw rough draft” available to everyone ASAP.
> >  
> > For those of you who have been early reviewers – your effort will not have
> > been in vain. I have all your comments and will address them over the next
> > month or two or three.
> >  
> > Just for some clarity, the existing Solr Wiki and even the recent
> > contribution of the LucidWorks Solr Reference to Apache really are still
> > great contributions to general knowledge about Solr, but the book is
> > intended to go much deeper into detail, especially with loads of examples
> > and a lot more narrative guide. For example, the book has a complete list of
> > the analyzer filters, each with a clean one-liner description. Ditto for
> > every parameter (although I would note that the LucidWorks Solr Reference
> > does a decent job of that as well.) Maybe, eventually, everything in the
> > book COULD (and will) be integrated into the standard Solr doc, but until
> > then, a single, integrated reference really is sorely needed. And, the book
> > has a lot of narrative guide and walking through examples as 

How can a Tokenizer be CoreAware?

2013-05-29 Thread Benson Margulies
I am currently testing some things with Solr 4.0.0. I tried to make a
tokenizer CoreAware, and was rewarded with:

Caused by: org.apache.solr.common.SolrException: Invalid 'Aware'
object: com.basistech.rlp.solr.RLPTokenizerFactory@19336006 --
org.apache.solr.util.plugin.SolrCoreAware must be an instance of:
[org.apache.solr.request.SolrRequestHandler]
[org.apache.solr.response.QueryResponseWriter]
[org.apache.solr.handler.component.SearchComponent]
[org.apache.solr.update.processor.UpdateRequestProcessorFactory]
[org.apache.solr.handler.component.ShardHandlerFactory]

I need this to allow cleanup of some cached items in the tokenizer.

Questions:

1: will a newer version allow me to do this directly?
2: is there some other approach that anyone would recommend? I could,
for example, make a fake object in the list above to act as a
singleton with a static accessor, but that seems pretty ugly.


Re: Reindexing strategy

2013-05-29 Thread Upayavira
I presume you are running Solr on a multi-core/CPU server. If you kept a
single process hitting Solr to re-index, you'd be using just one of
those cores. It would take as long as it takes, I can't see how you
would 'overload' it that way. 

I guess you could have a strategy that pulls 100 documents with an old
last_indexed, and push them for re-indexing. If you get the full 100
docs, you make a subsequent request immediately. If you get less than
100 back, you know you're up-to-date and can wait, say, 30s before
making another request.

Upayavira

On Wed, May 29, 2013, at 12:00 PM, Dotan Cohen wrote:
> I see that I do need to reindex my Solr index. The index consists of
> 20 million documents with a few hundred new documents added per minute
> (social media data). The documents are mostly smaller than 1KiB of
> data, but some may go as large as 10 KiB. All the data is text, and
> all indexed fields are stored.
> 
> To reindex, I am considering adding a 'last_indexed' field, and having
> a Python or Java application pull out N results every T seconds when
> sorting on "last_indexed asc". How might I determine a good values for
> N and T? I would like to know when the Solr index is 'overloaded', or
> whatever happens to Solr when it is being pushed beyond the limits of
> its hardware. What should I be looking at to know if Solr is over
> stressed? Is looking at CPU and memory good enough? Is there a way to
> measure I/O to the disk on which the Solr index is stored? Bear in
> mind that while the reindex is happening, clients will be performing
> searches and a few hundred documents will be written per minute. Note
> that the machine running Solr is an EC2 instance running on Amazon Web
> Services, and that the 'disk' on which the Solr index is stored in an
> EBS volume.
> 
> Thank you.
> 
> --
> Dotan Cohen
> 
> http://gibberish.co.il
> http://what-is-what.com


Re: Reindexing strategy

2013-05-29 Thread Dotan Cohen
On Wed, May 29, 2013 at 2:41 PM, Upayavira  wrote:
> I presume you are running Solr on a multi-core/CPU server. If you kept a
> single process hitting Solr to re-index, you'd be using just one of
> those cores. It would take as long as it takes, I can't see how you
> would 'overload' it that way.
>

I mean 'overload' Solr in the sense that it cannot read, process, and
write data fast enough because too much data is being handled. I
remind you that this system is writing hundreds of documents per
minute. Certainly there is a limit to what Solr can handle. I ask how
to know how close I am to this limit.


> I guess you could have a strategy that pulls 100 documents with an old
> last_indexed, and push them for re-indexing. If you get the full 100
> docs, you make a subsequent request immediately. If you get less than
> 100 back, you know you're up-to-date and can wait, say, 30s before
> making another request.
>

Actually, I would add a filter query for documents whose last_index
value is before the last schema change, and stop when less documents
were returned than were requested.

Thanks.


--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Note on The Book

2013-05-29 Thread Alexandre Rafalovitch
Perhaps, you will enjoy mine then:
http://www.packtpub.com/apache-solr-for-indexing-data/book .

I will send a formal announcement to the list a little later, but
basically this is a book for advanced beginners and early
intermediates and takes them from a basic index to multilingual
indexing with bells and whistles. Covers a small part of Solr (Solr is
big!), but shows how different parts work together. It's structured as
a cookbook but the narrative is a journey.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, May 29, 2013 at 7:33 AM, Yago Riveiro  wrote:
> IMHO I prefer narrative, as Erick says, explain all use-cases it's 
> impossible, cover the base cases is a good start.  Either way I miss a book 
> about solr different to a cookbook or a guide.
>
> Regards.
>
> --
> Yago Riveiro
> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>
>
> On Wednesday, May 29, 2013 at 12:19 PM, Erick Erickson wrote:
>
>> FWIW, picking up on Alexandre's point. One of my continual
>> frustrations with virtually _all_
>> technical books is they become endless pages of details without ever
>> mentioning why
>> the hell I should care. Unfortunately, explaining use-cases for
>> everything would only make
>> the book about 10,000 pages long. Siiigh.
>>
>> I guess you can take this as a vote for narrative
>>
>> Erick
>>
>> On Tue, May 28, 2013 at 4:53 PM, Jack Krupansky > (mailto:j...@basetechnology.com)> wrote:
>> > We'll have a blog for the book. We hope to have a first
>> > raw/rough/partial/draft published as an e-book in maybe 10 days to 2 weeks.
>> > As soon as we get that process under control, we'll start the blog. I'll
>> > keep your email on file and keep you posted.
>> >
>> > -- Jack Krupansky
>> >
>> > -Original Message- From: Swati Swoboda
>> > Sent: Tuesday, May 28, 2013 1:36 PM
>> > To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org)
>> > Subject: RE: Note on The Book
>> >
>> >
>> > I'd definitely prefer the spiral bound as well. E-books are great and your
>> > draft version seems very reasonably priced (aka I would definitely get it).
>> >
>> > Really looking forward to this. Is there a separate mailing list / etc. for
>> > the book for those who would like to receive updates on the status of the
>> > book?
>> >
>> > Thanks
>> >
>> > Swati Swoboda
>> > Software Developer - Igloo Software
>> > +1.519.489.4120 sswob...@igloosoftware.com 
>> > (mailto:sswob...@igloosoftware.com)
>> >
>> > Bring back Cake Fridays – watch a video you’ll actually like
>> > http://vimeo.com/64886237
>> >
>> >
>> > -Original Message-
>> > From: Jack Krupansky [mailto:j...@basetechnology.com]
>> > Sent: Thursday, May 23, 2013 7:15 PM
>> > To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org)
>> > Subject: Note on The Book
>> >
>> > To those of you who may have heard about the Lucene/Solr book that I and 
>> > two
>> > others are writing on Lucene and Solr, some bad and good news. The bad 
>> > news:
>> > The book contract with O’Reilly has been canceled. The good news: I’m going
>> > to proceed with self-publishing (possibly on Lulu or even Amazon) a 
>> > somewhat
>> > reduced scope Solr-only Reference Guide (with hints of Lucene). The scope 
>> > of
>> > the previous effort was too great, even for O’Reilly – a book larger than
>> > 800 pages (or even 600) that was heavy on reference and lighter on “guide”
>> > just wasn’t fitting in with their traditional “guide” model. In truth, Solr
>> > is just too complex for a simple guide that covers it all, let alone Lucene
>> > as well.
>> >
>> > I’ll announce more details in the coming weeks, but I expect to publish an
>> > e-book-only version of the book, focused on Solr reference (and plenty of
>> > guide as well), possibly on Lulu, plus eventually publish 4-8 individual
>> > print volumes for people who really want the paper. One model I may pursue
>> > is to offer the current, incomplete, raw, rough, draft as a $7.99 e-book,
>> > with the promise of updates every two weeks or a month as new and revised
>> > content and new releases of Solr become available. Maybe the individual
>> > e-book volumes would be $2 or $3. These are just preliminary ideas. Feel
>> > free to let me know what seems reasonable or excessive.
>> >
>> > For paper: Do people really want perfect bound, or would you prefer spiral
>> > bound that lies flat and folds back easily? I suppose we could offer both –
>> > which should be considered “premium”?
>> >
>> > I’ll announce more details next week. The immediate goal will be to get the
>> > “raw rough draft” available to everyone ASAP.
>> >
>> > For those of you who have been early reviewers – your effort will not have
>> > been in vain. I have all your comments and will address them

Problem with xpath expression in data-config.xml

2013-05-29 Thread Hans-Peter Stricker
Replacing the contents of 
solr-4.3.0\example\example-DIH\solr\rss\conf\rss-data-config.xml


by


   
   
   url="http://beautybooks88.blogspot.com/feeds/posts/default"; 
processor="XPathEntityProcessor" forEach="/feed/entry" 
transformer="DateFormatTransformer">


			commonField="true" />




			stripHTML="true"/>



			dateTimeFormat="-MM-dd'T'HH:mm:ss" />


   


and running the full dataimport from 
http://localhost:8983/solr/#/rss/dataimport//dataimport results in an error.


1) How could I have found the reason faster than I did - by looking into 
which log files,?


2) If you remove the first occurrence of /@href above, the import succeeds. 
(Note that the same pattern works for column "link".) What's the reason 
why?!!


Best regards and thanks in advance

Hans-Peter 





Advice : High-traffic web site

2013-05-29 Thread Ramzi Alqrainy
Hi Team,

Please I need your advice, I have high-traffic web site (100 million page
views/month) to 22 country and I want to build fast and powerfull search
engine. So, I use solr 4.3 and sperate every country to collection , but I
want to build right structure to accommodates high traffic .So, What advise
me to use? Solr cloud or Master-Slave or multi-cores .


Thanks in advance. 
Ramzi,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Advice-High-traffic-web-site-tp4066745.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: split document or not

2013-05-29 Thread Hard_Club
Do I need first search whole document Id and next between its paragraphs
stored in separated docs?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/split-document-or-not-tp4066170p4066751.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Note on The Book

2013-05-29 Thread Jack Krupansky
Erick, your point is well taken. Although my primary interest/skill is to 
produce a solid foundation reference (including tons of examples), the real 
goal is to then build on top of that foundation.


While I focus on the hard-core material - which really does include some 
narrative and lots of examples in addition to tons of "mere" reference, my 
co-author, Ryan Tabora, will focus almost exclusively on... narrative and 
diagrams.


And when I say reference, I also mean lots of examples. Even as the 
hard-core reference stabilizes, the examples will continue to grow ("like 
weeds!").


Once we get the current, existing, under-review, chapters packaged into the 
new book and available for purchase and download (maybe Lulu, not decided) - 
available, in a couple of weeks, it will be updated approximately every 
other week, both with additional reference material, and additional 
narrative and diagrams.


One of our priorities (after we get through Stage 0 of the next few weeks) 
is to in fact start giving each of the long Deep Dive Chapters enough 
narrative lead to basically say exactly that - why you should care.


A longer-term priority is to improve the balance of narrative and hard-core 
reference. Yeah, that will be a lot of pages. It already is. We were at 907 
pages and I was about to drop in another 166 pages on update handlers when 
O'Reilly threw up their hands and pulled the plug. I was estimating 1200 
pages at that stage. And I'll probably have another 60-80 pages on update 
request processors within a week or so. With more to come. That did include 
a lot of hard-core material and example code for Lucene, which won't be in 
the new Solr-only book. By focusing on an e-book the raw page count alone 
becomes moot. We haven't given up on print - the intent is eventually to 
have multiple volumes (4-8 or so, maybe more), both as cheaper e-books ($3 
to $5 each) and slimmer print volumes for people who don't need everything 
in print.


In fact, we will likely offer the revamped initial chapters of the book as a 
standalone introduction to Solr - narrative introduction ("why should you 
care about Solr"), basic concepts of Lucene and Solr (and why you should 
care!), brief tutorial walkthough of the major feature areas of Solr, and a 
case study. The intent would be both e-book and a slim print volume (75 
pages?).


Another priority (beyond Stage 0) is to develop a detailed roadmap diagram 
of Solr and how applications can use Solr, and then use that to show how 
each of the Deep Dive sections (heavy reference, but gradually adding more 
narrative over time.)


We will probably be very open to requests - what people really wish a book 
would actually do for them. The only request we won't be open to is to do it 
all in only 300 pages.


-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Wednesday, May 29, 2013 7:19 AM
To: solr-user@lucene.apache.org
Subject: Re: Note on The Book

FWIW, picking up on Alexandre's point. One of my continual
frustrations with virtually _all_
technical books is they become endless pages of details without ever
mentioning why
the hell I should care. Unfortunately, explaining use-cases for
everything would only make
the book about 10,000 pages long. Siiigh.

I guess you can take this as a vote for narrative

Erick

On Tue, May 28, 2013 at 4:53 PM, Jack Krupansky  
wrote:

We'll have a blog for the book. We hope to have a first
raw/rough/partial/draft published as an e-book in maybe 10 days to 2 
weeks.

As soon as we get that process under control, we'll start the blog. I'll
keep your email on file and keep you posted.

-- Jack Krupansky

-Original Message- From: Swati Swoboda
Sent: Tuesday, May 28, 2013 1:36 PM
To: solr-user@lucene.apache.org
Subject: RE: Note on The Book


I'd definitely prefer the spiral bound as well. E-books are great and your
draft version seems very reasonably priced (aka I would definitely get 
it).


Really looking forward to this. Is there a separate mailing list / etc. 
for

the book for those who would like to receive updates on the status of the
book?

Thanks

Swati Swoboda
Software Developer - Igloo Software
+1.519.489.4120  sswob...@igloosoftware.com

Bring back Cake Fridays – watch a video you’ll actually like
http://vimeo.com/64886237


-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Thursday, May 23, 2013 7:15 PM
To: solr-user@lucene.apache.org
Subject: Note on The Book

To those of you who may have heard about the Lucene/Solr book that I and 
two
others are writing on Lucene and Solr, some bad and good news. The bad 
news:
The book contract with O’Reilly has been canceled. The good news: I’m 
going
to proceed with self-publishing (possibly on Lulu or even Amazon) a 
somewhat
reduced scope Solr-only Reference Guide (with hints of Lucene). The scope 
of

the previous effort was too great, even for O’Reilly – a book larger than
800 pages (or even 600) that wa

Re: What exactly happens to extant documents when the schema changes?

2013-05-29 Thread Shawn Heisey
On 5/29/2013 1:07 AM, Dotan Cohen wrote:
> In the case of this particular application, reindexing really is
> overly burdensome as the application is performing hundreds of writes
> to the index per minute. How might I gauge how much spare I/O Solr
> could commit to a reindex? All the data that I need is in fact in
> stored fields.
> 
> Note that because the social media application that feeds our Solr
> index is global, there are no 'off hours'.

I handle this in a very specific way with my sharded index.  This won't
work for all designs, and the precise procedure won't work for SolrCloud.

There is a 'live' and a 'build' core for each of my shards.  When I want
to reindex, the program makes a note of my current position for deletes,
reinserts, and new documents.  Then I use a DIH full-import from mysql
into the build cores.  Once the import is done, I run the update cycle
of deletes, reinserts, and new documents on those build cores, using the
position information noted earlier.  Then I swap the cores so the new
index is online.

To adapt this for SolrCloud, I would need to use two collections, and
update a collection alias for what is considered live.

To control the I/O and CPU usage, you might need some kind of throttling
in your update/rebuild application.

I don't need any throttling in my design.  Because I'm using DIH, the
import only uses a single thread for each shard on the server.  I've got
RAID10 for storage and half of the CPU cores are still available for
queries, so it doesn't overwhelm the server.

The rebuild does lower performance, so I have the other copy of the
index handle queries while the rebuild is underway.  When the rebuild is
done on one copy, I run it again on the other copy.  Right now I'm
half-upgraded -- one copy of my index is version 3.5.0, the other is
4.2.1.  Switching to SolrCloud with sharding and replication would
eliminate this flexibility, unless I maintained two separate clouds.

Thanks,
Shawn



[Announce] Apache Solr 4.1 with RankingAlgorithm 1.4.7 available now -- includes realtime-search with multiple granularities

2013-05-29 Thread Nagendra Nagarajayya
I am very excited to announce the availability of Solr 4.3 with 
RankingAlgorithm40 1.4.8 with realtime-search with multiple 
granularities. realtime-search is very fast NRT and allows you to not 
only lookup a document by id but also allows you to search in realtime, 
see http://tgels.org/realtime-nrt.jsp. The update performance is about 
70,000 docs / sec. The query performance is in ms, allows you to  query 
a 10m wikipedia index (complete index) in <50 ms.


This release includes realtime-search with multiple granularities, 
request/intra-request. The granularity attribute controls the NRT 
behavior. With attribute granularity="request", all search components 
like search, faceting, highlighting, etc. will see a consistent view of 
the index and will all report the same number of documents. With 
granularity="intrarequest", the components may each report the most 
recent changes to the index. realtime-search has been contributed back 
to Apache Solr, see https://issues.apache.org/jira/browse/SOLR-3816.


RankingAlgorithm 1.4.8 supports the entire Lucene Query Syntax, ± and/or 
boolean/dismax/glob/regular expression/wildcard/fuzzy/prefix/suffix 
queries with boosting, etc. and is compatible with the lucene 4.3 api.


You can get more information about realtime-search performance from here:
http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver_4.x

You can download Solr 4.3 with RankingAlgorithm40 1.4.8 from here:
http://solr-ra.tgels.org

Please download and give the new version a try.

Regards,

Nagendra Nagarajayya
http://solr-ra.tgels.org
http://elasticsearch-ra.tgels.org
http://rankingalgorithm.tgels.org

Note:
1. Apache Solr 4.1 with RankingAlgorithm40 1.4.7 is an external project.




Re: Advice : High-traffic web site

2013-05-29 Thread Shalin Shekhar Mangar
I don't see how multi-cores will help you. Both SolrCloud or Master-Slave
can work for you. Of course, SolrCloud helps you in terms of maintaining
higher availability due to replica/leader fail over.

If your queries are always going to be limited to one country then creating
a collection per country is fine.


On Wed, May 29, 2013 at 6:12 PM, Ramzi Alqrainy wrote:

> Hi Team,
>
> Please I need your advice, I have high-traffic web site (100 million page
> views/month) to 22 country and I want to build fast and powerfull search
> engine. So, I use solr 4.3 and sperate every country to collection , but I
> want to build right structure to accommodates high traffic .So, What advise
> me to use? Solr cloud or Master-Slave or multi-cores .
>
>
> Thanks in advance.
> Ramzi,
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Advice-High-traffic-web-site-tp4066745.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Reindexing strategy

2013-05-29 Thread Shawn Heisey
On 5/29/2013 6:01 AM, Dotan Cohen wrote:
> I mean 'overload' Solr in the sense that it cannot read, process, and
> write data fast enough because too much data is being handled. I
> remind you that this system is writing hundreds of documents per
> minute. Certainly there is a limit to what Solr can handle. I ask how
> to know how close I am to this limit.

It's impossible for us to give you hard numbers.  You'll have to
experiment to know how fast you can reindex without killing your
servers.  A basic tenet for such experimentation, and something you
hopefully already know: You'll want to get baseline measurements before
you begin testing for comparison.

One of the most reliable Solr-specific indicators of pushing your
hardware too hard is that the QTime on your queries will start to
increase dramatically.  Solr 4.1 and later has more granular query time
statistics in the UI - the median and 95% numbers are much more
important than the average.

Outside of that, if your overall IOwait CPU percentage starts getting
near (or above) 30-50%, your server is struggling.  If all of your CPU
cores are staying near 100% usage, then it's REALLY struggling.

Assuming you have plenty of CPU cores, using fast storage and having
plenty of extra RAM will alleviate much of the I/O bottleneck.  The
usual rule of thumb for good query performance is that you need enough
RAM to put 50-100% of your index in the OS disk cache.  For blazing
performance during a rebuild, that becomes 100-200%.  If you had 150%,
that would probably keep most indexes well-cached even during a rebuild.

A rebuild will always lower performance, even with lots of RAM.

My earlier reply to your other message has some other ideas that will
hopefully help.

Thanks,
Shawn



Re: Replica shards not updating their index when update is sent to them

2013-05-29 Thread Sebastián Ramírez
I found how to solve the problem.

After sending a file to be indexed to a replica shard (node2):

curl 'http://node2:8983/solr/update?commit=true' -H 'Content-type:
text/xml' --data-binary 'asdfbig moth'

I can send a "commit" param to the same shard and then it gets updated:

curl 'http://node2:8983/solr/update?commit=true'


Another option is to send, from the beginning, a "commitWithin" param with
some milliseconds instead of a "commit" directly. That way, the commit
happens at most (the milliseconds specified) after, but the changes get
reflected in all shards, including the replica shard that received the
update request:

curl 
'http://node2:8983/solr/update?commitWithin=1
'


As these emails get archived, I hope this may help someone in the future.

Sebastián Ramírez


On Mon, May 20, 2013 at 4:32 PM, Sebastián Ramírez <
sebastian.rami...@senseta.com> wrote:

> Yes, It's happening with the latest version, 4.2.1
>
> Yes, it's easy to reproduce.
> It happened using 3 Virtual Machines and also happened using 3 physical
> nodes.
>
>
> Here are the details:
>
> I installed Hortonworks (a Hadoop distribution) in the 3 nodes. That
> installs Zookeeper.
>
> I used the "example" directory and copied it to the 3 nodes.
>
> I start Zookeeper in the 3 nodes.
>
> The first time, I run this command on each node, to start Solr:  java
> -jar -Dbootstrap_conf=true -DzkHost='node1,node2,node3'  start.jar
>
> As I understand, the "-Dbootstrap_conf=true" uploads the configuration to
> Zookeeper, so I don't need to do that the following times that I start each
> SolrCore.
>
> So, the following times, I run this on each node: java -jar
> -DzkHost='node0,node1,node2' start.jar
>
> Because I ran that command on node0 first, that node became the leader
> shard.
>
> I send an update to the leader shard, (in this case node0):
> I run curl 'http://node0:8983/solr/update?commit=true' -H 'Content-type:
> text/xml' --data-binary 'asdf name="content">buggy'
>
> When I query any shard I get the correct result:
> I run curl 'http://node0:8983/solr/select?q=id:asdf'
> or curl 'http://node1:8983/solr/select?q=id:asdf'
> or curl 'http://node2:8983/solr/select?q=id:asdf'
> (i.e. I send the query to each node), and then I get the expected response ...
> asdf buggy 
> ... ...
>
> But when I send an update to a replica shard (node2) it is updated only in
> the leader shard (node0) and in the other replica (node1), not in the shard
> that received the update (node2):
> I send an update to the replica node2,
> I run curl 'http://node2:8983/solr/update?commit=true' -H 'Content-type:
> text/xml' --data-binary 'asdf name="content">big moth'
>
> Then I query each node and I receive the updated results only from the
> leader shard (node0) and the other replica shard (node1).
>
> I run (leader, node0):
> curl 'http://node0:8983/solr/select?q=id:asdf'
> And I get:
> ... asdf big moth
>  ...  ...
>
> I run (other replica, node1):
> curl 'http://node1:8983/solr/select?q=id:asdf'
> And I get:
> ... asdf big moth
>  ...  ...
>
> I run (first replica, the one that received the update, node2):
> curl 'http://node2:8983/solr/select?q=id:asdf'
> And I get (old result):
> ... asdf buggy
>  ...  ...
>
> Thanks for your interest,
>
> Sebastián Ramírez
>
>
> On Mon, May 20, 2013 at 3:30 PM, Yonik Seeley wrote:
>
>> On Mon, May 20, 2013 at 4:21 PM, Sebastián Ramírez
>>  wrote:
>> > When I send an update to a non-leader (replica) shard (B), the updated
>> > results are reflected in the leader shard (A) and in the other replica
>> > shard (C), but not in the shard that received the update (B).
>>
>> I've never seen that before.  The replica that received the update
>> isn't treated as special in any way by the code, so it's not clear how
>> this could happen.
>>
>> What version of Solr is this (and does it happen with the latest
>> version)?  How easy is this to reproduce for you?
>>
>> -Yonik
>> http://lucidworks.com
>>
>
>

-- 
**
*This e-mail transmission, including any attachments, is intended only for 
the named recipient(s) and may contain information that is privileged, 
confidential and/or exempt from disclosure under applicable law. If you 
have received this transmission in error, or are not the named 
recipient(s), please notify Senseta immediately by return e-mail and 
permanently delete this transmission, including any attachments.*


RE: Why do FQs make my spelling suggestions so slow?

2013-05-29 Thread Dyer, James
Andy,

I opened this ticket so that someone can eventaully invistigate: 
https://issues.apache.org/jira/browse/SOLR-4874

Just an instanity check, I see I had misspelled "maxCollations" as 
"maxCollation" in my prior response.  When you tested with this set the same as 
"maxCollationTries", did you correct my spelling?  The thought is that by 
requiring it to return this many collations back, you are guaranteed to make it 
try the maximum time every time,giving yourself a cleaner test.  I am trying to 
isolate here if spellcheck is not running the queries properly or if the 
queries just naturally take that long to run over and over again.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Andy Lester [mailto:a...@petdance.com] 
Sent: Tuesday, May 28, 2013 4:22 PM
To: solr-user@lucene.apache.org
Subject: Re: Why do FQs make my spelling suggestions so slow?

Thanks for looking at this.

> What are the QTimes for the 0fq,1fq,2fq,4fq & 4fq cases with spellcheck 
> entirely turned off?  Is it about (or a little more than) half the total when 
> maxCollationTries=1 ?

With spellcheck off I get 8ms for 4fq query.


>  Also, with the varying # of fq's, how many collation tries does it take to 
> get 10 collations?

I don't know.  How can I tell?


> Possibly, a better way to test this is to set maxCollations = 
> maxCollationTries.  The reason is that it quits "trying" once it finds 
> "maxCollations", so if with 0fq's, lots of combinations can generate hits and 
> it doesn't need to try very many to get to 10.  But with more fq's, fewer 
> collations will pan out so now it is trying more up to 100 before (if ever) 
> it gets to 10.

It does just fine doing 100 collations so long as there are no FQs.  It seems 
to me that the FQs are taking an inordinate amount of extra time.  100 
collations in (roughly) the same amount of time as a single collation, so long 
as there are no FQs.  Why are the FQs such a drag on the collation process?


> (I'm assuming you have all non-search components like faceting turned off).

Yes, definitely.


>  So say with 2fq's it takes 10ms for the query to complete with spellcheck 
> off, and 20ms with "maxCollation = maxCollationTries = 1", then it will take 
> about 110ms with "maxCollation = maxCollationTries = 10".

I can do maxCollation = maxCollationTries = 100 and it comes back in 14ms, so 
long as I have FQs off.  Add a single FQ and it becomes 13499ms.

I can do maxCollation = maxCollationTries = 1000 and it comes back in 45ms, so 
long as I have FQs off.  Add a single FQ and it becomes 62038ms.


> But I think you're just setting maxCollationTries too high.  You're asking it 
> to do too much work in trying teens of combinations.

The results I get back with 100 tries are about twice as many as I get with 10 
tries.  That's a big difference to the user where it's trying to figure 
misspelled phrases.

Andy

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance





Escaping & character at Query

2013-05-29 Thread Furkan KAMACI
I use Solr 4.2.1 and I analyze that keyword:

kelile&dimle

at admin page:

WT

kelile&dimle

SF

kelile&dimle

TLCF

kelile&dimle

However when I escape that charter and search it:

solr/select?q=kelile\&dimle

here is what I see:



0
148
 
  
  *kelile\*
 


I have edismax as default query parser. How can I escape that "&"
character, why it doesn't like that?:

kelile\&dimle

Any ideas?


RE: Choosing specific fields for suggestions in SpellCheckerComponent

2013-05-29 Thread Dyer, James
I assume here you've got a spellcheck field like this:







...so that a check against "Spelling_Dictionary" always checks all 4, right?  
This is the only way I know to approximate having it spellcheck across multiple 
fields.  And as you have found, short of creating several separate versions of 
"Spelling_Dictionary", there is no way to specify the individual fields a la 
carte.  Although not supported, some of the work was done as part of SOLR-2993.

Your best bet now is to use "Spelling_Dictionary" as a master dictionary then 
use "maxCollationTries" to have it generate collations that only pertain the 
what the user actually searched against.  This is less efficient and may not 
work well (or at all) with Suggest.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Wilson Passos [mailto:wrpas...@gmail.com] 
Sent: Tuesday, May 28, 2013 11:54 PM
To: Solr User List
Subject: Choosing specific fields for suggestions in SpellCheckerComponent

Hi everyone,


I've been searching about how to configure the SpellCheckerComponent in 
Solr 4.0 to support suggestion queries based on s subset of the 
configured fields in schema.xml. Let's say the spell checking is 
configured to use these 4 fields:






I'd like to know if there's any possibility to dynamically set the 
SpellCheckerComponent to suggest terms using just fields "field2" and 
"field3" instead of the default behavior, which always includes 
suggestions across the 4 defined fields.

Thanks in advance for any help!




Re: Why do FQs make my spelling suggestions so slow?

2013-05-29 Thread Andy Lester

On May 29, 2013, at 9:46 AM, "Dyer, James"  wrote:

> Just an instanity check, I see I had misspelled "maxCollations" as 
> "maxCollation" in my prior response.  When you tested with this set the same 
> as "maxCollationTries", did you correct my spelling?

Yes, definitely.

Thanks for the ticket.  I am looking at the effects of turning on 
spellcheck.onlyMorePopular to true, which reduces the number of collations it 
seems to do, but doesn't affect the underlying question of "is the spellchecker 
doing FQs properly?"

Thanks,
Andy

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



Re: Escaping & character at Query

2013-05-29 Thread Carlos Bonilla
Hi,
try with double quotation marks (" ").

Carlos.


2013/5/29 Furkan KAMACI 

> I use Solr 4.2.1 and I analyze that keyword:
>
> kelile&dimle
>
> at admin page:
>
> WT
>
> kelile&dimle
>
> SF
>
> kelile&dimle
>
> TLCF
>
> kelile&dimle
>
> However when I escape that charter and search it:
>
> solr/select?q=kelile\&dimle
>
> here is what I see:
>
> 
> 
> 0
> 148
>  
>   
>   *kelile\*
>  
> 
>
> I have edismax as default query parser. How can I escape that "&"
> character, why it doesn't like that?:
>
> kelile\&dimle
>
> Any ideas?
>


using HTTP caching with shards in Solr 4.3

2013-05-29 Thread Ty
Hello,

I'd like to take advantage of Solr's HTTP caching feature (httpCaching
never304="false" in solrconfig.xml)..  It is behaving as expected when I do
a standard query against a Solr instance and then repeat it: I receive an
HTTP304 (not modified) response.

However, when using the "shards" functionality, I seem to be unable to get
the HTTP304 functionality.  When sending a request to a Solr instance that
includes other Solr instances in the "shards" parameter, a GET request is
sent to the original Solr instance, but it turns around and sends POST
requests to the Solr instances referenced in "shards".  Since POST requests
cannot generate a 304, I seem to be unable to use HTTP caching with shards.

Is there a way to make the original Solr instance query the shards with a
GET method?  Or some other way I can leverage HTTP caching when using
shards?

Thanks,
Ty


[Announce] Apache Solr 4.3 with RankingAlgorithm 1.4.8 available now -- includes realtime-search with multiple granularities (correction)

2013-05-29 Thread Nagendra Nagarajayya
I am very excited to announce the availability of Solr 4.3 with 
RankingAlgorithm40 1.4.8 with realtime-search with multiple 
granularities. realtime-search is very fast NRT and allows you to not 
only lookup a document by id but also allows you to search in realtime, 
see http://tgels.org/realtime-nrt.jsp. The update performance is about 
70,000 docs / sec. The query performance is in ms, allows you to  query 
a 10m wikipedia index (complete index) in <50 ms.


This release includes realtime-search with multiple granularities, 
request/intra-request. The granularity attribute controls the NRT 
behavior. With attribute granularity="request", all search components 
like search, faceting, highlighting, etc. will see a consistent view of 
the index and will all report the same number of documents. With 
granularity="intrarequest", the components may each report the most 
recent changes to the index. realtime-search has been contributed back 
to Apache Solr, see https://issues.apache.org/jira/browse/SOLR-3816.


RankingAlgorithm 1.4.8 supports the entire Lucene Query Syntax, ± and/or 
boolean/dismax/glob/regular expression/wildcard/fuzzy/prefix/suffix 
queries with boosting, etc. and is compatible with the lucene 4.3 api.


You can get more information about realtime-search performance from here:
http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver_4.x

You can download Solr 4.3 with RankingAlgorithm40 1.4.8 from here:
http://solr-ra.tgels.org

Please download and give the new version a try.

Regards,

Nagendra Nagarajayya
http://solr-ra.tgels.org
http://elasticsearch-ra.tgels.org
http://rankingalgorithm.tgels.org

Note:
1. Apache Solr 4.3 with RankingAlgorithm40 1.4.8 is an external project.






Re: Escaping & character at Query

2013-05-29 Thread Furkan KAMACI
When I write:

solr/select?q="kelile\&dimle"

it still says:



*"kelile\*




2013/5/29 Carlos Bonilla 

> Hi,
> try with double quotation marks (" ").
>
> Carlos.
>
>
> 2013/5/29 Furkan KAMACI 
>
> > I use Solr 4.2.1 and I analyze that keyword:
> >
> > kelile&dimle
> >
> > at admin page:
> >
> > WT
> >
> > kelile&dimle
> >
> > SF
> >
> > kelile&dimle
> >
> > TLCF
> >
> > kelile&dimle
> >
> > However when I escape that charter and search it:
> >
> > solr/select?q=kelile\&dimle
> >
> > here is what I see:
> >
> > 
> > 
> > 0
> > 148
> >  
> >   
> >   *kelile\*
> >  
> > 
> >
> > I have edismax as default query parser. How can I escape that "&"
> > character, why it doesn't like that?:
> >
> > kelile\&dimle
> >
> > Any ideas?
> >
>


Re: Escaping & character at Query

2013-05-29 Thread Jack Krupansky

You need to UUEncode the & with %26:

...solr/select?q=kelile%26dimle

Normally, & introduces a new URL query parameter in the URL.

-- Jack Krupansky

-Original Message- 
From: Furkan KAMACI 
Sent: Wednesday, May 29, 2013 10:55 AM 
To: solr-user@lucene.apache.org 
Subject: Escaping & character at Query 


I use Solr 4.2.1 and I analyze that keyword:

kelile&dimle

at admin page:

WT

kelile&dimle

SF

kelile&dimle

TLCF

kelile&dimle

However when I escape that charter and search it:

solr/select?q=kelile\&dimle

here is what I see:



0
148

 
 *kelile\*



I have edismax as default query parser. How can I escape that "&"
character, why it doesn't like that?:

kelile\&dimle

Any ideas?


Re: Escaping & character at Query

2013-05-29 Thread Carlos Bonilla
Hi, I meant:

solr/select?q="kelile&dimle"

Cheers.



2013/5/29 Jack Krupansky 

> You need to UUEncode the & with %26:
>
> ...solr/select?q=kelile%**26dimle
>
> Normally, & introduces a new URL query parameter in the URL.
>
> -- Jack Krupansky
>
> -Original Message- From: Furkan KAMACI Sent: Wednesday, May 29,
> 2013 10:55 AM To: solr-user@lucene.apache.org Subject: Escaping &
> character at Query
> I use Solr 4.2.1 and I analyze that keyword:
>
> kelile&dimle
>
> at admin page:
>
> WT
>
> kelile&dimle
>
> SF
>
> kelile&dimle
>
> TLCF
>
> kelile&dimle
>
> However when I escape that charter and search it:
>
> solr/select?q=kelile\&dimle
>
> here is what I see:
>
> 
> 
> 0
> 148
> 
>  
>  *kelile\*
> 
> 
>
> I have edismax as default query parser. How can I escape that "&"
> character, why it doesn't like that?:
>
> kelile\&dimle
>
> Any ideas?
>


Re: Problem with xpath expression in data-config.xml

2013-05-29 Thread Shalin Shekhar Mangar
On Wed, May 29, 2013 at 6:05 PM, Hans-Peter Stricker
wrote:

> Replacing the contents of solr-4.3.0\example\example-**
> DIH\solr\rss\conf\rss-data-**config.xml
>
> by
>
> 
>
>
>http://beautybooks88.*
> *blogspot.com/feeds/posts/**default"
> processor="**XPathEntityProcessor" forEach="/feed/entry" transformer="**
> DateFormatTransformer">
>  commonField="true" />
>  xpath="/feed/link[@rel='self']**/@href" commonField="true" />
>
> 
> 
>  xpath="/feed/entry/content" stripHTML="true"/>
>  />
>  xpath="/feed/entry/category/@**term"/>
>  dateTimeFormat="-MM-dd'T'**HH:mm:ss" />
> 
>
> 
>
> and running the full dataimport from http://localhost:8983/solr/#/**
> rss/dataimport//dataimportresults
>  in an error.
>
> 1) How could I have found the reason faster than I did - by looking into
> which log files,?
>
>
DIH uses the same log file as solr. The name/location of the log file
depends on your logging configuration.


> 2) If you remove the first occurrence of /@href above, the import
> succeeds. (Note that the same pattern works for column "link".) What's the
> reason why?!!
>

I think there is a bug here. In my tests, xpath="/root/a/@y"
works, xpath="/root/a[@x='1']/@y" also works. But if you use them together
the one which is defined last returns null. I'll open an issue.


-- 
Regards,
Shalin Shekhar Mangar.


Re: Escaping & character at Query

2013-05-29 Thread Jack Krupansky

So, make it:

solr/select?q="kelile%26dimle"

-- Jack Krupansky

-Original Message- 
From: Carlos Bonilla 
Sent: Wednesday, May 29, 2013 11:39 AM 
To: solr-user@lucene.apache.org 
Subject: Re: Escaping & character at Query 


Hi, I meant:

solr/select?q="kelile&dimle"

Cheers.



2013/5/29 Jack Krupansky 


You need to UUEncode the & with %26:

...solr/select?q=kelile%**26dimle

Normally, & introduces a new URL query parameter in the URL.

-- Jack Krupansky

-Original Message- From: Furkan KAMACI Sent: Wednesday, May 29,
2013 10:55 AM To: solr-user@lucene.apache.org Subject: Escaping &
character at Query
I use Solr 4.2.1 and I analyze that keyword:

kelile&dimle

at admin page:

WT

kelile&dimle

SF

kelile&dimle

TLCF

kelile&dimle

However when I escape that charter and search it:

solr/select?q=kelile\&dimle

here is what I see:



0
148

 
 *kelile\*



I have edismax as default query parser. How can I escape that "&"
character, why it doesn't like that?:

kelile\&dimle

Any ideas?



Re: Why do FQs make my spelling suggestions so slow?

2013-05-29 Thread Nicholas Fellows
I also have problems getting the solrspellchecker to utilise existing FQ
params correctly.
we have some fairly monster queries

eg : http://pastebin.com/4XzGpfeC

I cannot seem to get our FQ parameters to be honored when generating
results.
In essence i am getting collations that yield no results when the filter
query is applied.

We have items that are by default not shown when out of stock or
forthcoming. the user
can select whether to show these or not.

Is there something wrong with my query or perhaps my use case is not
supported?

Im using nested query and local params etc

Would very much appreciate some assistance on this one as 2days worth of
hacking, and pestering
people on IRC have not yet yeilded a solution for me. Im not even sure what
i am trying
is even possible! Some sort of clarification on this would really help!

Cheers

Nick...




On 29 May 2013 15:57, Andy Lester  wrote:

>
> On May 29, 2013, at 9:46 AM, "Dyer, James" 
> wrote:
>
> > Just an instanity check, I see I had misspelled "maxCollations" as
> "maxCollation" in my prior response.  When you tested with this set the
> same as "maxCollationTries", did you correct my spelling?
>
> Yes, definitely.
>
> Thanks for the ticket.  I am looking at the effects of turning on
> spellcheck.onlyMorePopular to true, which reduces the number of collations
> it seems to do, but doesn't affect the underlying question of "is the
> spellchecker doing FQs properly?"
>
> Thanks,
> Andy
>
> --
> Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance
>
>


-- 
Nick Fellows
DJdownload.com
---
10 Greenland Street
London
NW10ND
United Kingdom
---
n...@djdownload.com (E)

---
www.djdownload.com


Re: Not able to search Spanish word with ascent in solr

2013-05-29 Thread jignesh
Solr returning error 500, when i post data with ascent chars...

Any solution for that?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Not-able-to-search-Spanish-word-with-ascent-in-solr-tp4064404p4066808.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Re: error while indexing huge filesystem with data import handler and FileListEntityProcessor

2013-05-29 Thread jerome . dupont


The configuraiton works with LineEntityProcessor, with few documents (havn
(t test with many documents yet.
For information this the config








... fields defintion

file:///D:/jed/noticesBib/listeNotices.txt contains the follwing lines
jed/noticesBib/3/4/307/34307035.xml
jed/noticesBib/3/4/307/34307082.xml
jed/noticesBib/3/4/307/34307110.xml
jed/noticesBib/3/4/307/34307197.xml
jed/noticesBib/3/4/307/34307350.xml
jed/noticesBib/3/4/307/34307399.xml
...
(Could have containes all the location with the beginning, but I wanted to
test the concatenation of filename.

That works fine, thanks for the help!!

Next step, the same without using a file. (I'll write it in another post).

Regards,
Jérôme

Exposition  Guy Debord, un art de la guerre  - du 27 mars au 13 juillet 2013 - 
BnF - François-Mitterrand / Grande Galerie Avant d'imprimer, pensez à 
l'environnement. 

Re: Problem with xpath expression in data-config.xml

2013-05-29 Thread Shalin Shekhar Mangar
I created https://issues.apache.org/jira/browse/SOLR-4875


On Wed, May 29, 2013 at 9:15 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

>
> On Wed, May 29, 2013 at 6:05 PM, Hans-Peter Stricker  > wrote:
>
>> Replacing the contents of solr-4.3.0\example\example-**
>> DIH\solr\rss\conf\rss-data-**config.xml
>>
>> by
>>
>> 
>>
>>
>>http://beautybooks88.
>> **blogspot.com/feeds/posts/**default"
>> processor="**XPathEntityProcessor" forEach="/feed/entry" transformer="**
>> DateFormatTransformer">
>> > commonField="true" />
>> > xpath="/feed/link[@rel='self']**/@href" commonField="true" />
>>
>> 
>> > xpath="/feed/entry/link[@rel='**self']/@href" />
>> > xpath="/feed/entry/content" stripHTML="true"/>
>> > xpath="/feed/entry/author" />
>> > xpath="/feed/entry/category/@**term"/>
>> > dateTimeFormat="-MM-dd'T'**HH:mm:ss" />
>> 
>>
>> 
>>
>> and running the full dataimport from http://localhost:8983/solr/#/**
>> rss/dataimport//dataimportresults
>>  in an error.
>>
>> 1) How could I have found the reason faster than I did - by looking into
>> which log files,?
>>
>>
> DIH uses the same log file as solr. The name/location of the log file
> depends on your logging configuration.
>
>
>> 2) If you remove the first occurrence of /@href above, the import
>> succeeds. (Note that the same pattern works for column "link".) What's the
>> reason why?!!
>>
>
> I think there is a bug here. In my tests, xpath="/root/a/@y"
> works, xpath="/root/a[@x='1']/@y" also works. But if you use them together
> the one which is defined last returns null. I'll open an issue.
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Not able to search Spanish word with ascent in solr

2013-05-29 Thread Gora Mohanty
On 29 May 2013 21:39, jignesh  wrote:
> Solr returning error 500, when i post data with ascent chars...
>
> Any solution for that?
[...]

Please look in the Solr logs for the
appropriate error message.

Regards,
Gora


Solr Cloud Using Zookeeper SASL

2013-05-29 Thread Don Tran
Hiya all,

Got a question that I hope someone can help me with.
I was just wondering if anyone has ever used Solr Cloud using Zookeepers that 
have SASL authentication turned on?
I can't seem to find any documentation on it so any help at all would be 
amazing!

Thanks,


Don Tran
Developer
Omnifone
Island Studios
47 British Grove
London W4 2NL, UK
T: +44 (0)20 8600 0580
F: +44 (0)20 8600 0581
S:  DonTranOmnifone
E:  dt...@omnifone.com


__
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
__

RE: Why do FQs make my spelling suggestions so slow?

2013-05-29 Thread Dyer, James
Instead of "maxCollationTries=0", use a value greater than zero.  Zero means 
not to check if the collation will return hits.  1 means to test 1 possible 
combination against the index and return it only if it returns hits.  2 tries 
up to 2 possibilities, etc.  As you have "spellcheck.maxCollations=8", you'll 
probably want maxCollationTries at least that large.  Maybe 10-20 would be 
better.  Make it as low as possible to get generally good results, or as high 
as possible before the performance on a query with many misspelled words gets 
too bad.

Also, use a spellcheck.count greater than 2.  This is as many corrections per 
misspelled term you want it to consider.  If using DirectSolrSpellChecker, you 
can have it set low, 5-10 might be good.  If using IndexBased- or FileBased 
spell checkers, use at least 10.

Also, do not use "onlyMorePopular" unless you indeed want every term in the 
user's query to be replaced with higher-frequency terms (even correctly-spelled 
terms get replaced).  If you want it to suggest even for words that are in the 
dictionary, try "spellcheck.alternativeTermCount" instead.  Try setting it to 
about half of "spellcheck.count" (but at least 10 if using IndexBased- or 
FileBased spell checkers).

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Nicholas Fellows [mailto:n...@djdownload.com] 
Sent: Wednesday, May 29, 2013 11:06 AM
To: solr-user@lucene.apache.org
Subject: Re: Why do FQs make my spelling suggestions so slow?

I also have problems getting the solrspellchecker to utilise existing FQ
params correctly.
we have some fairly monster queries

eg : http://pastebin.com/4XzGpfeC

I cannot seem to get our FQ parameters to be honored when generating
results.
In essence i am getting collations that yield no results when the filter
query is applied.

We have items that are by default not shown when out of stock or
forthcoming. the user
can select whether to show these or not.

Is there something wrong with my query or perhaps my use case is not
supported?

Im using nested query and local params etc

Would very much appreciate some assistance on this one as 2days worth of
hacking, and pestering
people on IRC have not yet yeilded a solution for me. Im not even sure what
i am trying
is even possible! Some sort of clarification on this would really help!

Cheers

Nick...




On 29 May 2013 15:57, Andy Lester  wrote:

>
> On May 29, 2013, at 9:46 AM, "Dyer, James" 
> wrote:
>
> > Just an instanity check, I see I had misspelled "maxCollations" as
> "maxCollation" in my prior response.  When you tested with this set the
> same as "maxCollationTries", did you correct my spelling?
>
> Yes, definitely.
>
> Thanks for the ticket.  I am looking at the effects of turning on
> spellcheck.onlyMorePopular to true, which reduces the number of collations
> it seems to do, but doesn't affect the underlying question of "is the
> spellchecker doing FQs properly?"
>
> Thanks,
> Andy
>
> --
> Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance
>
>


-- 
Nick Fellows
DJdownload.com
---
10 Greenland Street
London
NW10ND
United Kingdom
---
n...@djdownload.com (E)

---
www.djdownload.com



Re: Not able to search Spanish word with ascent in solr

2013-05-29 Thread Raymond Wiker
On May 29, 2013, at 18:09 , jignesh  wrote:
> Solr returning error 500, when i post data with ascent chars...
> 
> Any solution for that?

The solution probably involves using the correct encoding, and ensuring that 
the HTTP request sets the appropriate header values accordingly.

In other words, more likely a pilot error than a SOLR error... at least that 
was the case for me :-)

Re: Why do FQs make my spelling suggestions so slow?

2013-05-29 Thread Shalin Shekhar Mangar
James, this is very useful information. Can you please add this to the wiki?


On Wed, May 29, 2013 at 10:36 PM, Dyer, James
wrote:

> Instead of "maxCollationTries=0", use a value greater than zero.  Zero
> means not to check if the collation will return hits.  1 means to test 1
> possible combination against the index and return it only if it returns
> hits.  2 tries up to 2 possibilities, etc.  As you have
> "spellcheck.maxCollations=8", you'll probably want maxCollationTries at
> least that large.  Maybe 10-20 would be better.  Make it as low as possible
> to get generally good results, or as high as possible before the
> performance on a query with many misspelled words gets too bad.
>
> Also, use a spellcheck.count greater than 2.  This is as many corrections
> per misspelled term you want it to consider.  If using
> DirectSolrSpellChecker, you can have it set low, 5-10 might be good.  If
> using IndexBased- or FileBased spell checkers, use at least 10.
>
> Also, do not use "onlyMorePopular" unless you indeed want every term in
> the user's query to be replaced with higher-frequency terms (even
> correctly-spelled terms get replaced).  If you want it to suggest even for
> words that are in the dictionary, try "spellcheck.alternativeTermCount"
> instead.  Try setting it to about half of "spellcheck.count" (but at least
> 10 if using IndexBased- or FileBased spell checkers).
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Nicholas Fellows [mailto:n...@djdownload.com]
> Sent: Wednesday, May 29, 2013 11:06 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Why do FQs make my spelling suggestions so slow?
>
> I also have problems getting the solrspellchecker to utilise existing FQ
> params correctly.
> we have some fairly monster queries
>
> eg : http://pastebin.com/4XzGpfeC
>
> I cannot seem to get our FQ parameters to be honored when generating
> results.
> In essence i am getting collations that yield no results when the filter
> query is applied.
>
> We have items that are by default not shown when out of stock or
> forthcoming. the user
> can select whether to show these or not.
>
> Is there something wrong with my query or perhaps my use case is not
> supported?
>
> Im using nested query and local params etc
>
> Would very much appreciate some assistance on this one as 2days worth of
> hacking, and pestering
> people on IRC have not yet yeilded a solution for me. Im not even sure what
> i am trying
> is even possible! Some sort of clarification on this would really help!
>
> Cheers
>
> Nick...
>
>
>
>
> On 29 May 2013 15:57, Andy Lester  wrote:
>
> >
> > On May 29, 2013, at 9:46 AM, "Dyer, James"  >
> > wrote:
> >
> > > Just an instanity check, I see I had misspelled "maxCollations" as
> > "maxCollation" in my prior response.  When you tested with this set the
> > same as "maxCollationTries", did you correct my spelling?
> >
> > Yes, definitely.
> >
> > Thanks for the ticket.  I am looking at the effects of turning on
> > spellcheck.onlyMorePopular to true, which reduces the number of
> collations
> > it seems to do, but doesn't affect the underlying question of "is the
> > spellchecker doing FQs properly?"
> >
> > Thanks,
> > Andy
> >
> > --
> > Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance
> >
> >
>
>
> --
> Nick Fellows
> DJdownload.com
> ---
> 10 Greenland Street
> London
> NW10ND
> United Kingdom
> ---
> n...@djdownload.com (E)
>
> ---
> www.djdownload.com
>
>


-- 
Regards,
Shalin Shekhar Mangar.


RE: Why do FQs make my spelling suggestions so slow?

2013-05-29 Thread Dyer, James
It has been in the wiki, more or less.  See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and following 
sections.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: Wednesday, May 29, 2013 12:41 PM
To: solr-user@lucene.apache.org
Subject: Re: Why do FQs make my spelling suggestions so slow?

James, this is very useful information. Can you please add this to the wiki?


On Wed, May 29, 2013 at 10:36 PM, Dyer, James
wrote:

> Instead of "maxCollationTries=0", use a value greater than zero.  Zero
> means not to check if the collation will return hits.  1 means to test 1
> possible combination against the index and return it only if it returns
> hits.  2 tries up to 2 possibilities, etc.  As you have
> "spellcheck.maxCollations=8", you'll probably want maxCollationTries at
> least that large.  Maybe 10-20 would be better.  Make it as low as possible
> to get generally good results, or as high as possible before the
> performance on a query with many misspelled words gets too bad.
>
> Also, use a spellcheck.count greater than 2.  This is as many corrections
> per misspelled term you want it to consider.  If using
> DirectSolrSpellChecker, you can have it set low, 5-10 might be good.  If
> using IndexBased- or FileBased spell checkers, use at least 10.
>
> Also, do not use "onlyMorePopular" unless you indeed want every term in
> the user's query to be replaced with higher-frequency terms (even
> correctly-spelled terms get replaced).  If you want it to suggest even for
> words that are in the dictionary, try "spellcheck.alternativeTermCount"
> instead.  Try setting it to about half of "spellcheck.count" (but at least
> 10 if using IndexBased- or FileBased spell checkers).
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Nicholas Fellows [mailto:n...@djdownload.com]
> Sent: Wednesday, May 29, 2013 11:06 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Why do FQs make my spelling suggestions so slow?
>
> I also have problems getting the solrspellchecker to utilise existing FQ
> params correctly.
> we have some fairly monster queries
>
> eg : http://pastebin.com/4XzGpfeC
>
> I cannot seem to get our FQ parameters to be honored when generating
> results.
> In essence i am getting collations that yield no results when the filter
> query is applied.
>
> We have items that are by default not shown when out of stock or
> forthcoming. the user
> can select whether to show these or not.
>
> Is there something wrong with my query or perhaps my use case is not
> supported?
>
> Im using nested query and local params etc
>
> Would very much appreciate some assistance on this one as 2days worth of
> hacking, and pestering
> people on IRC have not yet yeilded a solution for me. Im not even sure what
> i am trying
> is even possible! Some sort of clarification on this would really help!
>
> Cheers
>
> Nick...
>
>
>
>
> On 29 May 2013 15:57, Andy Lester  wrote:
>
> >
> > On May 29, 2013, at 9:46 AM, "Dyer, James"  >
> > wrote:
> >
> > > Just an instanity check, I see I had misspelled "maxCollations" as
> > "maxCollation" in my prior response.  When you tested with this set the
> > same as "maxCollationTries", did you correct my spelling?
> >
> > Yes, definitely.
> >
> > Thanks for the ticket.  I am looking at the effects of turning on
> > spellcheck.onlyMorePopular to true, which reduces the number of
> collations
> > it seems to do, but doesn't affect the underlying question of "is the
> > spellchecker doing FQs properly?"
> >
> > Thanks,
> > Andy
> >
> > --
> > Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance
> >
> >
>
>
> --
> Nick Fellows
> DJdownload.com
> ---
> 10 Greenland Street
> London
> NW10ND
> United Kingdom
> ---
> n...@djdownload.com (E)
>
> ---
> www.djdownload.com
>
>


-- 
Regards,
Shalin Shekhar Mangar.


Seeming bug in ConcurrentUpdateSolrServer

2013-05-29 Thread Benson Margulies
The comment here is clearly wrong, since there is no division by two.

I think that the code is wrong, because this results in not starting
runners when it should start runners. Am I misanalyzing?

if (runners.isEmpty() || (queue.remainingCapacity() < queue.size() // queue

  // is

  // half

  // full

  // and

  // we

  // can

  // add

  // more

  // runners
  && runners.size() < threadCount)) {


Re: Indexing Solr, Multiple Doc Types. Production of Multiple Values for UniqueKey Field Using TemplateTransformer

2013-05-29 Thread Chris Hostetter

: org.apache.solr.common.SolrException: Document contains multiple values for
: uniqueKey field: uid=[A_1, dc1999fcf12df900]

By the looks of things, your TemplateTransformer is properly creating a 
value of "A_${atest.id}" where "${atest.id} == 1" for that document ... 
the problem seems to be that somehow another value is getting put in your 
uid field containing "dc1999fcf12df900"

Based on your stack trace, i suspect that in addition to having DIH 
create a value for your "uid" field, you also have 
SignatureUpdateProcessorFactory configured (in your solrconfig.xml) to 
generate a synthetic unique id based on the signature of some fields as 
well...

: 
org.apache.solr.update.processor.SignatureUpdateProcessorFactory$SignatureUpdateProcessor.processAdd(SignatureUpdateProcessorFactory.java:194)


-Hoss


Re: Seeming bug in ConcurrentUpdateSolrServer

2013-05-29 Thread Shalin Shekhar Mangar
On Wed, May 29, 2013 at 11:29 PM, Benson Margulies wrote:

> The comment here is clearly wrong, since there is no division by two.
>
> I think that the code is wrong, because this results in not starting
> runners when it should start runners. Am I misanalyzing?
>
> if (runners.isEmpty() || (queue.remainingCapacity() < queue.size() // queue
>
>   // is
>
>   // half
>
>   // full
>
>   // and
>
>   // we
>
>   // can
>
>   // add
>
>   // more
>
>   // runners
>   && runners.size() < threadCount)) {
>


queue.remainingCapacity() returns capacity - queue.size() so the comment is
correct.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Seeming bug in ConcurrentUpdateSolrServer

2013-05-29 Thread Benson Margulies
Ah. So now I have to find some other explanation of why it never
creates more than one thread, even when I make a very deep queue and
specify 6 threads.

On Wed, May 29, 2013 at 2:25 PM, Shalin Shekhar Mangar
 wrote:
> On Wed, May 29, 2013 at 11:29 PM, Benson Margulies 
> wrote:
>
>> The comment here is clearly wrong, since there is no division by two.
>>
>> I think that the code is wrong, because this results in not starting
>> runners when it should start runners. Am I misanalyzing?
>>
>> if (runners.isEmpty() || (queue.remainingCapacity() < queue.size() // queue
>>
>>   // is
>>
>>   // half
>>
>>   // full
>>
>>   // and
>>
>>   // we
>>
>>   // can
>>
>>   // add
>>
>>   // more
>>
>>   // runners
>>   && runners.size() < threadCount)) {
>>
>
>
> queue.remainingCapacity() returns capacity - queue.size() so the comment is
> correct.
>
> --
> Regards,
> Shalin Shekhar Mangar.


Re: SOLR 4.3.0 - How to make fq optional?

2013-05-29 Thread bbarani
Hoss, for some reason this doesn't work when I pass the latlong value via
query..

This is the query.. It just returns all the values for fname='peter'
(doesn't filter for Tarmac, Florida).

fl=*,score&rows=10&qt=findperson&fps_latlong=26.22084,-80.29&fps_fname=peter

*solrconfig.xml*


{!switch case='*:*' default=$fq_bbox
v=$fps_latlong}


_query_:"{!bbox pt=$fps_latlong sfield=geo
d=$fps_dist}"


*Works when used via custom component:*

This works fine when the latlong value is passed via custom component. We
have a custom component which gets the location name via query, calculates
the corresponding lat long co-ordinates stored in TSV file and passes the
co-ordinates to the query.


*Custom component config:*

 centroids.tsv
fps_where
fps_latitude
fps_longitude
fps_latlong
fps_dist
48.2803
1.0
  

*Custom component query:*
fl=*,score&rows=10&*fps_where="new york,
ny"*&qt=findperson&fps_latlong=26.22084,-80.29&fps_dist=.10&fps_fname=peter

Is it a bug?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-4-3-0-How-to-make-fq-optional-tp4066592p4066862.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR 4.3.0 - How to make fq optional?

2013-05-29 Thread Chris Hostetter

: Hoss, for some reason this doesn't work when I pass the latlong value via
: query..
...
: fl=*,score&rows=10&qt=findperson&fps_latlong=26.22084,-80.29&fps_fname=peter

Hmmm, are these appends & invariants on your "findperson" requestHandler?

What does debugQuery=true show you the pplied filters are?

: 
: _query_:"{!bbox pt=$fps_latlong sfield=geo
: d=$fps_dist}"
: 

Why do you have the _query_ hack in there?  i haven't had a chance to test 
this, but perhaps that hack doesn't play nicely with localparam variable 
substitution? it should just be...

   {!bbox pt=$fps_latlong sfield=geo d=$fps_dist}

: This works fine when the latlong value is passed via custom component. We
: have a custom component which gets the location name via query, calculates
: the corresponding lat long co-ordinates stored in TSV file and passes the
: co-ordinates to the query.


Ok wait a minute -- all bets are off about this working if you have a 
custom component in the mix adding/removing params.  you need to provide 
us with more details about exactly how your component works, where it's 
configured in the component list, and how it is adding the "fps_latlong" 
param it generates to the query, becuase my guesses are one of two things 
are happening:

1) your component is doing it's logic after the query parsing has already 
happened and the variables have been evaluated -- at which point 
fps_latlong isn't set yet, so you get the case='*:*' behavior

2) your component is doing it's logic before the query parsing happens, 
but it is setting the value of fps_latlong in a way that the query parsing 
code doens't see it hen resolving the local variables.


-Hoss


Problem with PatternReplaceCharFilter

2013-05-29 Thread jasimop
Hi,

I have a Problem when using PatternReplaceCharFilter when indexing a field.
I created the following field: 

  

-->



  
  



  


And I created a field that is indexed and stored:


I need to index a document with such a structure in this field:


Basically I have some sort of XML structure, i need only to search in the
"content" attribute, but when highlighting i need to get back to the
enclosing XML tags.

So with the 3 Regex I want to remove all unwanted tags and tokenize/index
only the important data.
I know that I could use HTMLStripCharFilterFactory but then also the tag
names, attribute names and values get indexed. And I don't want to search in
that content too.

I read the following in the doc:
NOTE: If you produce a phrase that has different length to source string and
the field is used for highlighting for a term of the phrase, you will face a
trouble. 

The thing is, why is this the case? When running the analyze from solr admin
the CharFilters generate
"the content to search in the second content line" which looks perfect, but
then the StandardTokenizer
gets the start and end positions of the tokens wrong. Why is this the case?
Does there exist another solution to my problem?
Could I use the following method I saw in the doc of
PatternReplaceCharFilter:
protected int correct(int currentOff) Documentation: Retrieve the corrected
offset.

How could I solve such a task?






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-PatternReplaceCharFilter-tp4066869.html
Sent from the Solr - User mailing list archive at Nabble.com.


Support for Mongolian language

2013-05-29 Thread Sagar Chaturvedi
Hi All,

Does solr provide support for Mongolian language?

Also which filters and tokenizers must be used for Chinese, Japanese and Korean 
languages?

Regards,
Sagar Chaturvedi


DISCLAIMER:
---
The contents of this e-mail and any attachment(s) are confidential and
intended
for the named recipient(s) only. 
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the
opinions of NEC or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is
strictly prohibited. If you have 
received this email in error please delete it and notify the sender
immediately. .
---


Re: SOLR 4.3.0 - How to make fq optional?

2013-05-29 Thread bbarani
Ok..I removed all my custom components from findperson request handler..

  

  lucene
  explicit
  10
  AND
  person_name_all_i
  50
  32
  
  *:*
  
  
{!switch case='*:*' default=$fq_bbox
v=$fps_latlong}


_query_:"{!bbox pt=$fps_latlong sfield=geo
d=$fps_dist}"



query
  debug

  


My query:
select?fl=*,score&rows=10&qt=findperson&fps_latlong=42.3482,-75.1890

The above query just returns everything back from SOLR (should only return
results corresponding to lat and long values passed in the query)...

I even tried changing the below hack, but got the same results.

{!bbox pt=$fps_latlong sfield=geo
d=$fps_dist}

Not sure if I am missing something...




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-4-3-0-How-to-make-fq-optional-tp4066592p4066872.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Support for Mongolian language

2013-05-29 Thread bbarani
Check out..

wiki.apache.org/solr/LanguageAnalysis‎

For some reason the above site takes long time to open.. 






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Support-for-Mongolian-language-tp4066871p4066874.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query syntax error: Cannot parse ....

2013-05-29 Thread bbarani
# has a separate meaning in URL.. You need to encode that..

http://lucene.apache.org/core/3_6_0/queryparsersyntax.html#Escaping%20Special%20Characters.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-syntax-error-Cannot-parse-tp4066560p4066879.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Grouping results based on the field which matched the query

2013-05-29 Thread bbarani
Not sure if you are looking for this..

http://wiki.apache.org/solr/FieldCollapsing



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Grouping-results-based-on-the-field-which-matched-the-query-tp4065670p4066882.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR 4.3.0 - How to make fq optional?

2013-05-29 Thread Chris Hostetter

: 
...
:   
: {!switch case='*:*' default=$fq_bbox
: v=$fps_latlong}
: 
: 
: _query_:"{!bbox pt=$fps_latlong sfield=geo
: d=$fps_dist}"
: 
: 

...you have your "appends" and "invariants" nested inside your defaults -- 
they should be siblings...

 
...
 
 
...
 
 
...
 

-Hoss


Re: SOLR 4.3.0 - How to make fq optional?

2013-05-29 Thread bbarani
 I totally missed that..Sorry about that :)...It seems to work fine now...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-4-3-0-How-to-make-fq-optional-tp4066592p4066891.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Note on The Book

2013-05-29 Thread Markus Jelsma
Jack,

I'd prefer tons of information instead of a meager 300 page book that leaves a 
lot of questions. I'm looking forward to a paperback or hardcover book and 
price doesn't really matter, it is going to be worth it anyway.

Thanks,
Markus

 
 
-Original message-
> From:Jack Krupansky 
> Sent: Wed 29-May-2013 15:10
> To: solr-user@lucene.apache.org
> Subject: Re: Note on The Book
> 
> Erick, your point is well taken. Although my primary interest/skill is to 
> produce a solid foundation reference (including tons of examples), the real 
> goal is to then build on top of that foundation.
> 
> While I focus on the hard-core material - which really does include some 
> narrative and lots of examples in addition to tons of "mere" reference, my 
> co-author, Ryan Tabora, will focus almost exclusively on... narrative and 
> diagrams.
> 
> And when I say reference, I also mean lots of examples. Even as the 
> hard-core reference stabilizes, the examples will continue to grow ("like 
> weeds!").
> 
> Once we get the current, existing, under-review, chapters packaged into the 
> new book and available for purchase and download (maybe Lulu, not decided) - 
> available, in a couple of weeks, it will be updated approximately every 
> other week, both with additional reference material, and additional 
> narrative and diagrams.
> 
> One of our priorities (after we get through Stage 0 of the next few weeks) 
> is to in fact start giving each of the long Deep Dive Chapters enough 
> narrative lead to basically say exactly that - why you should care.
> 
> A longer-term priority is to improve the balance of narrative and hard-core 
> reference. Yeah, that will be a lot of pages. It already is. We were at 907 
> pages and I was about to drop in another 166 pages on update handlers when 
> O'Reilly threw up their hands and pulled the plug. I was estimating 1200 
> pages at that stage. And I'll probably have another 60-80 pages on update 
> request processors within a week or so. With more to come. That did include 
> a lot of hard-core material and example code for Lucene, which won't be in 
> the new Solr-only book. By focusing on an e-book the raw page count alone 
> becomes moot. We haven't given up on print - the intent is eventually to 
> have multiple volumes (4-8 or so, maybe more), both as cheaper e-books ($3 
> to $5 each) and slimmer print volumes for people who don't need everything 
> in print.
> 
> In fact, we will likely offer the revamped initial chapters of the book as a 
> standalone introduction to Solr - narrative introduction ("why should you 
> care about Solr"), basic concepts of Lucene and Solr (and why you should 
> care!), brief tutorial walkthough of the major feature areas of Solr, and a 
> case study. The intent would be both e-book and a slim print volume (75 
> pages?).
> 
> Another priority (beyond Stage 0) is to develop a detailed roadmap diagram 
> of Solr and how applications can use Solr, and then use that to show how 
> each of the Deep Dive sections (heavy reference, but gradually adding more 
> narrative over time.)
> 
> We will probably be very open to requests - what people really wish a book 
> would actually do for them. The only request we won't be open to is to do it 
> all in only 300 pages.
> 
> -- Jack Krupansky
> 
> -Original Message- 
> From: Erick Erickson
> Sent: Wednesday, May 29, 2013 7:19 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Note on The Book
> 
> FWIW, picking up on Alexandre's point. One of my continual
> frustrations with virtually _all_
> technical books is they become endless pages of details without ever
> mentioning why
> the hell I should care. Unfortunately, explaining use-cases for
> everything would only make
> the book about 10,000 pages long. Siiigh.
> 
> I guess you can take this as a vote for narrative
> 
> Erick
> 
> On Tue, May 28, 2013 at 4:53 PM, Jack Krupansky  
> wrote:
> > We'll have a blog for the book. We hope to have a first
> > raw/rough/partial/draft published as an e-book in maybe 10 days to 2 
> > weeks.
> > As soon as we get that process under control, we'll start the blog. I'll
> > keep your email on file and keep you posted.
> >
> > -- Jack Krupansky
> >
> > -Original Message- From: Swati Swoboda
> > Sent: Tuesday, May 28, 2013 1:36 PM
> > To: solr-user@lucene.apache.org
> > Subject: RE: Note on The Book
> >
> >
> > I'd definitely prefer the spiral bound as well. E-books are great and your
> > draft version seems very reasonably priced (aka I would definitely get 
> > it).
> >
> > Really looking forward to this. Is there a separate mailing list / etc. 
> > for
> > the book for those who would like to receive updates on the status of the
> > book?
> >
> > Thanks
> >
> > Swati Swoboda
> > Software Developer - Igloo Software
> > +1.519.489.4120  sswob...@igloosoftware.com
> >
> > Bring back Cake Fridays – watch a video you’ll actually like
> > http://vimeo.com/64886237
> >
> >
> > -Original M

Re: Seeming bug in ConcurrentUpdateSolrServer

2013-05-29 Thread Benson Margulies
I now understand the algorithm, but I don't understand why is the way it is.

Consider one of these objects configure with a handful of threads and
a pretty big queue.

When the first request comes in, the object creates one runner. It
then won't create a second runner until the Q reaches 1/2-full.

If the idea is that we want to pile up 'a lot' (1/2-of-a-q) of work
before sending any of it, why start that first runner?

On Wed, May 29, 2013 at 2:45 PM, Benson Margulies  wrote:
> Ah. So now I have to find some other explanation of why it never
> creates more than one thread, even when I make a very deep queue and
> specify 6 threads.
>
> On Wed, May 29, 2013 at 2:25 PM, Shalin Shekhar Mangar
>  wrote:
>> On Wed, May 29, 2013 at 11:29 PM, Benson Margulies 
>> wrote:
>>
>>> The comment here is clearly wrong, since there is no division by two.
>>>
>>> I think that the code is wrong, because this results in not starting
>>> runners when it should start runners. Am I misanalyzing?
>>>
>>> if (runners.isEmpty() || (queue.remainingCapacity() < queue.size() // queue
>>>
>>>   // is
>>>
>>>   // half
>>>
>>>   // full
>>>
>>>   // and
>>>
>>>   // we
>>>
>>>   // can
>>>
>>>   // add
>>>
>>>   // more
>>>
>>>   // runners
>>>   && runners.size() < threadCount)) {
>>>
>>
>>
>> queue.remainingCapacity() returns capacity - queue.size() so the comment is
>> correct.
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.


RE: Slow Highlighter Performance Even Using FastVectorHighlighter

2013-05-29 Thread Bryan Loofbourrow
Andy,

> I don't understand why it's taking 7 secs to return highlights. The size
> of the index is only 20.93 MB. The JVM heap Xms and Xmx are both set to
> 1024 for this verification purpose and that should be more than enough.
> The processor is plenty powerful enough as well.
>
> Running VisualVM shows all my CPU time being taken by mainly these 3
> methods:
>
> org.apache.lucene.search.vectorhighlight.FieldPhraseList$WeightedPhraseI
> nfo.getStartOffset()
> org.apache.lucene.search.vectorhighlight.FieldPhraseList$WeightedPhraseI
> nfo.getStartOffset()
> org.apache.lucene.search.vectorhighlight.FieldPhraseList.addIfNoOverlap(
> )

That is a strange and interesting set of things to be spending most of
your CPU time on. The implication, I think, is that the number of term
matches in the document for terms in your query (or, at least, terms
matching exact words or the beginning of phrases in your query) is
extremely high . Perhaps that's coming from this "partial word match" you
mention -- how does that work?

-- Bryan

> My guess is that this has something to do with how I'm handling partial
> word matches/highlighting. I have setup another request handler that
> only searches the whole word fields and it returns in 850 ms with
> highlighting.
>
> Any ideas?
>
> - Andy
>
>
> -Original Message-
> From: Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic.com]
> Sent: Monday, May 20, 2013 1:39 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Slow Highlighter Performance Even Using
> FastVectorHighlighter
>
> My guess is that the problem is those 200M documents.
> FastVectorHighlighter is fast at deciding whether a match, especially a
> phrase, appears in a document, but it still starts out by walking the
> entire list of term vectors, and ends by breaking the document into
> candidate-snippet fragments, both processes that are proportional to the
> length of the document.
>
> It's hard to do much about the first, but for the second you could
> choose
> to expose FastVectorHighlighter's FieldPhraseList representation, and
> return offsets to the caller rather than fragments, building up your own
> snippets from a separate store of indexed files. This would also permit
> you to set stored="false", improving your memory/core size ratio, which
> I'm guessing could use some improving. It would require some work, and
> it
> would require you to store a representation of what was indexed outside
> the Solr core, in some constant-bytes-to-character representation that
> you
> can use offsets with (e.g. UTF-16, or ASCII+entity references).
>
> However, you may not need to do this -- it may be that you just need
> more
> memory for your search machine. Not JVM memory, but memory that the O/S
> can use as a file cache. What do you have now? That is, how much memory
> do
> you have that is not used by the JVM or other apps, and how big is your
> Solr core?
>
> One way to start getting a handle on where time is being spent is to set
> up VisualVM. Turn on CPU sampling, send in a bunch of the slow highlight
> queries, and look at where the time is being spent. If it's mostly in
> methods that are just reading from disk, buy more memory. If you're on
> Linux, look at what top is telling you. If the CPU usage is low and the
> "wa" number is above 1% more often than not, buy more memory (I don't
> know
> why that wa number makes sense, I just know that it has been a good rule
> of thumb for us).
>
> -- Bryan
>
> > -Original Message-
> > From: Andy Brown [mailto:andy_br...@rhoworld.com]
> > Sent: Monday, May 20, 2013 9:53 AM
> > To: solr-user@lucene.apache.org
> > Subject: Slow Highlighter Performance Even Using FastVectorHighlighter
> >
> > I'm providing a search feature in a web app that searches for
> documents
> > that range in size from 1KB to 200MB of varying MIME types (PDF, DOC,
> > etc). Currently there are about 3000 documents and this will continue
> to
> > grow. I'm providing full word search and partial word search. For each
> > document, there are three source fields that I'm interested in
> searching
> > and highlighting on: name, description, and content. Since I'm
> providing
> > both full and partial word search, I've created additional fields that
> > get tokenized differently: name_par, description_par, and content_par.
> > Those are indexed and stored as well for querying and highlighting. As
> > suggested in the Solr wiki, I've got two catch all fields text and
> > text_par for faster querying.
> >
> > An average search results page displays 25 results and I provide
> paging.
> > I'm just returning the doc ID in my Solr search results and response
> > times have been quite good (1 to 10 ms). The problem in performance
> > occurs when I turn on highlighting. I'm already using the
> > FastVectorHighlighter and depending on the query, it has taken as long
> > as 15 seconds to get the highlight snippets. However, this isn't
> always
> > the case. Certain query terms result in 1 sec or 

Solr query performance tool

2013-05-29 Thread Spyros Lambrinidis
Hi,

Lately we are seeing increased latency times on solr and we would like to
know which queries / facet searches are the most time consuming and heavy
for our system.

Is there any tool equivalent to the mysql low log? Does solr keep the times
each query takes in some log?

Thank you for your help.

-S.


-- 
Spyros Lambrinidis
Head of Engineering & Commando of
PeoplePerHour.com
Evmolpidon 23
118 54, Gkazi
Athens, Greece
Tel: +30 210 3455480

Follow us on Facebook 
Follow us on Twitter 


Re: Problem with PatternReplaceCharFilter

2013-05-29 Thread Jack Krupansky
Just replace the stripped markup with the equivalent number of spaces to 
maintain positions.


Was there some specific problem you were encountering?

-- Jack Krupansky

-Original Message- 
From: jasimop

Sent: Wednesday, May 29, 2013 4:12 PM
To: solr-user@lucene.apache.org
Subject: Problem with PatternReplaceCharFilter

Hi,

I have a Problem when using PatternReplaceCharFilter when indexing a field.
I created the following field:
   
 
   
   -->
   
   
   
 
 
   
   
   
 
   

And I created a field that is indexed and stored:


I need to index a document with such a structure in this field:


Basically I have some sort of XML structure, i need only to search in the
"content" attribute, but when highlighting i need to get back to the
enclosing XML tags.

So with the 3 Regex I want to remove all unwanted tags and tokenize/index
only the important data.
I know that I could use HTMLStripCharFilterFactory but then also the tag
names, attribute names and values get indexed. And I don't want to search in
that content too.

I read the following in the doc:
NOTE: If you produce a phrase that has different length to source string and
the field is used for highlighting for a term of the phrase, you will face a
trouble.

The thing is, why is this the case? When running the analyze from solr admin
the CharFilters generate
"the content to search in the second content line" which looks perfect, but
then the StandardTokenizer
gets the start and end positions of the tokens wrong. Why is this the case?
Does there exist another solution to my problem?
Could I use the following method I saw in the doc of
PatternReplaceCharFilter:
protected int correct(int currentOff) Documentation: Retrieve the corrected
offset.

How could I solve such a task?






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-PatternReplaceCharFilter-tp4066869.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Note on The Book

2013-05-29 Thread Jack Krupansky

Markus,

Okay, more pages it is!

-- Jack Krupansky

-Original Message- 
From: Markus Jelsma

Sent: Wednesday, May 29, 2013 5:35 PM
To: solr-user@lucene.apache.org
Subject: RE: Note on The Book

Jack,

I'd prefer tons of information instead of a meager 300 page book that leaves 
a lot of questions. I'm looking forward to a paperback or hardcover book and 
price doesn't really matter, it is going to be worth it anyway.


Thanks,
Markus



-Original message-

From:Jack Krupansky 
Sent: Wed 29-May-2013 15:10
To: solr-user@lucene.apache.org
Subject: Re: Note on The Book

Erick, your point is well taken. Although my primary interest/skill is to
produce a solid foundation reference (including tons of examples), the 
real

goal is to then build on top of that foundation.

While I focus on the hard-core material - which really does include some
narrative and lots of examples in addition to tons of "mere" reference, my
co-author, Ryan Tabora, will focus almost exclusively on... narrative and
diagrams.

And when I say reference, I also mean lots of examples. Even as the
hard-core reference stabilizes, the examples will continue to grow ("like
weeds!").

Once we get the current, existing, under-review, chapters packaged into 
the
new book and available for purchase and download (maybe Lulu, not 
decided) -

available, in a couple of weeks, it will be updated approximately every
other week, both with additional reference material, and additional
narrative and diagrams.

One of our priorities (after we get through Stage 0 of the next few weeks)
is to in fact start giving each of the long Deep Dive Chapters enough
narrative lead to basically say exactly that - why you should care.

A longer-term priority is to improve the balance of narrative and 
hard-core
reference. Yeah, that will be a lot of pages. It already is. We were at 
907

pages and I was about to drop in another 166 pages on update handlers when
O'Reilly threw up their hands and pulled the plug. I was estimating 1200
pages at that stage. And I'll probably have another 60-80 pages on update
request processors within a week or so. With more to come. That did 
include

a lot of hard-core material and example code for Lucene, which won't be in
the new Solr-only book. By focusing on an e-book the raw page count alone
becomes moot. We haven't given up on print - the intent is eventually to
have multiple volumes (4-8 or so, maybe more), both as cheaper e-books ($3
to $5 each) and slimmer print volumes for people who don't need everything
in print.

In fact, we will likely offer the revamped initial chapters of the book as 
a

standalone introduction to Solr - narrative introduction ("why should you
care about Solr"), basic concepts of Lucene and Solr (and why you should
care!), brief tutorial walkthough of the major feature areas of Solr, and 
a

case study. The intent would be both e-book and a slim print volume (75
pages?).

Another priority (beyond Stage 0) is to develop a detailed roadmap diagram
of Solr and how applications can use Solr, and then use that to show how
each of the Deep Dive sections (heavy reference, but gradually adding more
narrative over time.)

We will probably be very open to requests - what people really wish a book
would actually do for them. The only request we won't be open to is to do 
it

all in only 300 pages.

-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Wednesday, May 29, 2013 7:19 AM
To: solr-user@lucene.apache.org
Subject: Re: Note on The Book

FWIW, picking up on Alexandre's point. One of my continual
frustrations with virtually _all_
technical books is they become endless pages of details without ever
mentioning why
the hell I should care. Unfortunately, explaining use-cases for
everything would only make
the book about 10,000 pages long. Siiigh.

I guess you can take this as a vote for narrative

Erick

On Tue, May 28, 2013 at 4:53 PM, Jack Krupansky 
wrote:
> We'll have a blog for the book. We hope to have a first
> raw/rough/partial/draft published as an e-book in maybe 10 days to 2
> weeks.
> As soon as we get that process under control, we'll start the blog. I'll
> keep your email on file and keep you posted.
>
> -- Jack Krupansky
>
> -Original Message- From: Swati Swoboda
> Sent: Tuesday, May 28, 2013 1:36 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Note on The Book
>
>
> I'd definitely prefer the spiral bound as well. E-books are great and 
> your

> draft version seems very reasonably priced (aka I would definitely get
> it).
>
> Really looking forward to this. Is there a separate mailing list / etc.
> for
> the book for those who would like to receive updates on the status of 
> the

> book?
>
> Thanks
>
> Swati Swoboda
> Software Developer - Igloo Software
> +1.519.489.4120  sswob...@igloosoftware.com
>
> Bring back Cake Fridays – watch a video you’ll actually like
> http://vimeo.com/64886237
>
>
> -Original Message-
> From: Jack Krupansky 

java.lang.IllegalAccessError when invoking protected method from another class in the same package path but different jar.

2013-05-29 Thread bbarani
Hi,

I am overriding the query component and creating a custom component. I am
using _responseDocs from org.apache.solr.handler.component.ResponseBuilder
to get the values. I have my component in same package
(org.apache.solr.handler.component) to access the _responseDocs value.

Everything works fine when I run the test for this component but I am
getting the below error when I package the custom component in a jar and
place it in lib directory (inside solr/lib - using basic jetty
configuration).

I assume this is due to the fact that different class loaders load different
class at runtime. Is there a way to resolve this?

java.lang.IllegalAccessError: tried to access field
org.apache.solr.handler.component.ResponseBuilder._responseDocs from class
org.apache.solr.handler.component.WPFastDistributedQueryComponentjava.lang.RuntimeException: java.lang.IllegalAccessError: tried
to access field
org.apache.solr.handler.component.ResponseBuilder._responseDocs from class
org.apache.solr.handler.component.CustomComponent
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:670)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:380)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:365)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.IllegalAccessError: tried to access field
org.apache.solr.handler.component.ResponseBuilder._responseDocs from class
org.apache.solr.handler.component.WPFastDistributedQueryComponent
at
org.apache.solr.handler.component.WPFastDistributedQueryComponent.handleResponses(WPFastDistributedQueryComponent.java:131)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
... 26 more




--
View this message in context: 
http://lucene.472066.n3.nabble.com/java-lang-IllegalAccessError-when-invoking-protected-method-from-another-class-in-the-same-package-p-tp4066904.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Support for Mongolian language

2013-05-29 Thread Upayavira


On Wed, May 29, 2013, at 09:34 PM, bbarani wrote:
> Check out..
> 
> wiki.apache.org/solr/LanguageAnalysis‎
> 
> For some reason the above site takes long time to open.. 

There's a known performance issue with the wiki. Admins are working on
it.

Upayavira


Re: java.lang.IllegalAccessError when invoking protected method from another class in the same package path but different jar.

2013-05-29 Thread bbarani
My assumptions were right :)

I was able to fix this error by copying all my custom jar inside
webapp/web-inf/lib directory and everything started working 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/java-lang-IllegalAccessError-when-invoking-protected-method-from-another-class-in-the-same-package-p-tp4066904p4066906.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr 4.3: write.lock is not removed

2013-05-29 Thread Zhang, Lisheng
Hi,
 
I recently upgraded solr from 3.6.1 to 4.3, it works well, but I noticed that 
after finishing
indexing 
 
write.lock
 
is NOT removed. Later if I index again it still works OK. Only after I shutdown 
Tomcat 
then write.lock is removed. This behavior caused some problem like I could not 
use luke
to observe indexed data.
 
I did not see any error/warning messages.
 
Is this the designed behavior? Can I have the old behavior (after commit 
write.lock is
removed) through configuration?
 
Thanks very much for helps, Lisheng


Re: Solr query performance tool

2013-05-29 Thread Otis Gospodnetic
Hi,

The regular Solr log logs Qtime for each query.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On May 29, 2013 5:59 PM, "Spyros Lambrinidis" 
wrote:

> Hi,
>
> Lately we are seeing increased latency times on solr and we would like to
> know which queries / facet searches are the most time consuming and heavy
> for our system.
>
> Is there any tool equivalent to the mysql low log? Does solr keep the times
> each query takes in some log?
>
> Thank you for your help.
>
> -S.
>
>
> --
> Spyros Lambrinidis
> Head of Engineering & Commando of
> PeoplePerHour.com
> Evmolpidon 23
> 118 54, Gkazi
> Athens, Greece
> Tel: +30 210 3455480
>
> Follow us on Facebook 
> Follow us on Twitter 
>


Re: Solr query performance tool

2013-05-29 Thread Erick Erickson
The qtimes are in the solr log, you'll see lines like:
params={q=*:*} hits=32 status=0 QTime=5

QTime is the time spent serving the query but does NOT include
assembling the response.

Best
Erick

On Wed, May 29, 2013 at 5:58 PM, Spyros Lambrinidis
 wrote:
> Hi,
>
> Lately we are seeing increased latency times on solr and we would like to
> know which queries / facet searches are the most time consuming and heavy
> for our system.
>
> Is there any tool equivalent to the mysql low log? Does solr keep the times
> each query takes in some log?
>
> Thank you for your help.
>
> -S.
>
>
> --
> Spyros Lambrinidis
> Head of Engineering & Commando of
> PeoplePerHour.com
> Evmolpidon 23
> 118 54, Gkazi
> Athens, Greece
> Tel: +30 210 3455480
>
> Follow us on Facebook 
> Follow us on Twitter 


Re: java.lang.IllegalAccessError when invoking protected method from another class in the same package path but different jar.

2013-05-29 Thread Chris Hostetter

: Subject: java.lang.IllegalAccessError when invoking protected method from
: another class in the same package path but different jar.
...
: I am overriding the query component and creating a custom component. I am
: using _responseDocs from org.apache.solr.handler.component.ResponseBuilder
: to get the values. I have my component in same package

_responseDocs is not "protected" it is "package-private" which is why you 
can't access it from a subclass in another *runtime* pacakge.  Even if 
you put your custom component in the same org.apache.solr... package 
namespace, the runtime package is determined by the ClassLoader combined 
with the source package...

http://www.cooljeff.co.uk/2009/05/03/the-subtleties-of-overriding-package-private-methods/

...this is helpful to ensure plugins don't attempt to do tihngs they 
shouldn't.

In general, the ResponseBuilder class internals aren't very friendly in 
terms of allowing custom components to interact with the intermediate 
results of other built in components -- it's primarily designed arround 
letting other internal Solr components share data with eachother in 
(hopefully) well tested ways.  Note that there is even a specific comment 
one line directly above the declaration of _responseDocs that alludes to 
it and several other variables being deliberately package-private...

  /* private... components that don't own these shouldn't use them */
  SolrDocumentList _responseDocs;
  StatsInfo _statsInfo;
  TermsComponent.TermsHelper _termsHelper;
  SimpleOrderedMap>> _pivots;

If you want access to the SolrDocumentList containing the query results, 
the only safe way/time to do that is by fetching it out of the response 
(ResponseBuilder.rsp) after the QueryComponent has put it there in it's 
finishStage -- untill then ResponseBuilder._responseDocs may not be 
correct (ie: distribute search, grouped search, etc...)

-Hoss


multiple field join?

2013-05-29 Thread cmd.ares
http://wiki.apache.org/solr/Join  
I found solr join is actually  sql subquery,does solr  support 3 tables jion
? the sql like this
SELECT xxx, yyy 
FROM collection1
WHERE 
outer_id IN (SELECT inner_id FROM collection1 where zzz = "vvv")
and 
outer_id2 IN (SELECT inner_id2 FROM collection1 where ttt = "xxx")
and 
outer_id3 IN (SELECT inner_id3 FROM collection1 where ppp = "rrr")

how to write the solr request url?
thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/multiple-field-join-tp4066930.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problem with PatternReplaceCharFilter

2013-05-29 Thread jasimop
Honestly, I have no idea how to do that.
PatternReplaceCharFilter doesn't seem to have a parameter like
preservePositions="true" and
optionally fillCharacter=" ".
And I don't think I can express this simply as regex. How would I count in a
pure
regex the length difference before and after the match?

Well, the specific problem is, that when highlighting the term positions are
wrong and the
result is not a valid XML structure that I can handle.
I expect something like
search in" ee="ff" />
but I can 
tLineaa="bb" cc="dd" content="the content to search
in" ee="ff" />

Thanks for your help.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-PatternReplaceCharFilter-tp4066869p4066939.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Not able to search Spanish word with ascent in solr

2013-05-29 Thread Deep Lotia
Hi,

I am having a same kind of issue. I am not able to search accented characters 
of spanish. For eg: - Según, próximos etc.

I have field called attr_content which holds the content of a PDF file whose 
contents are in spanish. I am using Apache Tika to index the contents of a PDF 
file. I have wrote a java class which using the Apache Tika classes to read 
the PDF contents and index it to solr 3.5.

Anything which can be missed? Is it be because of encoding issues.

Please help.

Deep



Automatic cross linking

2013-05-29 Thread It-forum

Hello,

I'm looking to use Solr for creating cross linking in text.

For exemple : I'll like to be able to request for a text field, an 
article, in my blog. And that Solr use a script/method, request to parse 
the text, find all matching categories term and caps the results.


Do you have any suggestion, documentation, tutorial, source code :), 
that could help me to realise this optimisation.


Regards.

David