date:20130616

Re: New operator.

2013-06-16 Thread Mikhail Khludnev

Hello Yanis,

Two options.
1. Create own SearchComponent, which adds filterQuery into request, and add
it into SearchHandler. http://wiki.apache.org/solr/SearchComponent
2. Create QParserPlugin and call them by request param
...&fq={!yanisqp}applyvector&...
http://wiki.apache.org/solr/SolrPlugins#QParserPlugin

On Sun, Jun 16, 2013 at 10:01 AM, Yanis Kakamaikis <
yanis.kakamai...@gmail.com> wrote:

> Hi all,I want to add a new operator to my solr.   I need that operator
> to call my proprietary engine and build an answer vector to solr, in a way
> that this vector will be part of the boolean query at the next step.   How
> do I do that?
> Thanks
>

-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

Re: Solr large boolean filter

2013-06-16 Thread Mikhail Khludnev

Right.
FieldCacheTermsFilter is an option. You need to create own QParserPlugin
which yields FieldCacheTermsFilter, hook him as ..&fq={!idsqp
cache=false}&..
Mind disabling caching! Mind term ecoding due to field type!

I also suggest to check how much it spend for tokenization. Once a day I've
got some profit by using efficient encoding for this param (try fixed
length or vint)

There is a one more gain when the core query is highly selective and id
filter is weakly selective, in  this case using explicit PostFiltering
(what a hack btw) is desired. see
http://yonik.com/posts/advanced-filter-caching-in-solr/

>From my experience the proper solution for such problems is moving to one
of the joins or ExternalFileField.

On Sun, Jun 16, 2013 at 2:49 AM, Igor Kustov  wrote:

> I know i'm not the first one with this problem.
>
> I'm currently using solr 4.2.1 with approximately 10 mln documents in the
> index.
>
> The index is updated frequently.
>
> The filter_query is just a one big boolean or query by id.
>
> fq=id:(1 2 3 4 ... 50950)
>
> ids list is always different and not sequential.
>
> The problem is that query performance not so well, as you can imagine.
>
> In some particular cases i'm able to do filtering based on different
> fields,
> but in some cases (like 30-40% of all queries) i'm still end up with this
> large id filter.
>
> I'm looking for the ways to improve this query performance.
>
> It doesn't seem like solr join could be applied there.
>
> Another option that I found is to somehow use Lucene FieldCacheTermsFilter.
> Does it worth a try?
>
> Maybe i've missed some other options?
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-large-boolean-filter-tp4070747.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

Re: in Solr 3.5, optimization increase the index size to double

2013-06-16 Thread Erick Erickson

Optimzing will _temporarily_ double the index size,
but it shouldn't be permanent. Is it possible that
you have inadvertently told Solr to keep an extra
snapshot? I think it's "numberToKeep" in your
replication handler, but I'm going from memory here.

Best
Erick


On Fri, Jun 14, 2013 at 2:15 AM, Montu v Boda
 wrote:
> Hi,
>
> i have replicate my index from 1.4 to 3.5 and after replication i try
> optimize the index in 3.5 with below URL.
> http://localhost:9002/solr35/collection1/update?optimize=true&commit=true
>
> when i optimize the index in 3.5, it's increase the index size to double.
>
> in 1.4 the size of index is 428GB and after optimization in 3.5 it becomes
> 791 GB.
>
> Thanks & Rrgards
> Montu v Boda
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/in-Solr-3-5-optimization-increase-the-index-size-to-double-tp4070433.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr using a ridiculous amount of memory

2013-06-16 Thread Erick Erickson

John:

If you'd like to add your experience to the Wiki, create
an ID and let us know what it is and we'll add you to the
contributors list. Unfortunately we had problems with
spam pages to we added this step.

Make sure you include your logon in the request.

Thanks,
Erick

On Fri, Jun 14, 2013 at 8:55 AM, John Nielsen  wrote:
> Sorry for not getting back to the list sooner. It seems like I finally
> solved the memory problems by following Toke's instruction of splitting the
> cores up into smaller chunks.
>
> After some major refactoring, our 15 cores have now turned into ~500 cores
> and our memory consumption has dropped dramaticly. Running 200 webshops now
> actually uses less memory as our 24 test shops did before.
>
> Thank you to everyone who helped, and especially to Toke.
>
> I looked at the wiki, but could not find any reference to this unintuitive
> way of using memory. Did I miss it somewhere?
>
>
>
> On Fri, Apr 19, 2013 at 1:30 PM, Erick Erickson 
> wrote:
>
>> Hmmm. There has been quite a bit of work lately to support a couple of
>> things that might be of interest (4.3, which Simon cut today, probably
>> available to all mid next week at the latest). Basically, you can
>> choose to pre-define all the cores in solr.xml (so-called "old style")
>> _or_ use the new-style solr.xml which uses "auto-discover" mode to
>> walk the indicated directory and find all the cores (indicated by the
>> presence of a 'core.properties' file). Don't know if this would make
>> your particular case easier, and I should warn you that this is
>> relatively new code (although there are some reasonable unit tests).
>>
>> You also have the option to only load the cores when they are
>> referenced, and only keep N cores open at a time (loadOnStartup and
>> transient properties).
>>
>> See: http://wiki.apache.org/solr/CoreAdmin#Configuration and
>> http://wiki.apache.org/solr/Solr.xml%204.3%20and%20beyond
>>
>> Note, the docs are somewhat sketchy, so if you try to go down this
>> route let us know anything that should be improved (or you can be
>> added to the list of wiki page contributors and help out!)
>>
>> Best
>> Erick
>>
>> On Thu, Apr 18, 2013 at 8:31 AM, John Nielsen  wrote:
>> >> You are missing an essential part: Both the facet and the sort
>> >> structures needs to hold one reference for each document
>> >> _in_the_full_index_, even when the document does not have any values in
>> >> the fields.
>> >>
>> >
>> > Wow, thank you for this awesome explanation! This is where the penny
>> > dropped for me.
>> >
>> > I will definetely move to a multi-core setup. It will take some time and
>> a
>> > lot of re-coding. As soon as I know the result, I will let you know!
>> >
>> >
>> >
>> >
>> >
>> >
>> > --
>> > Med venlig hilsen / Best regards
>> >
>> > *John Nielsen*
>> > Programmer
>> >
>> >
>> >
>> > *MCB A/S*
>> > Enghaven 15
>> > DK-7500 Holstebro
>> >
>> > Kundeservice: +45 9610 2824
>> > p...@mcb.dk
>> > www.mcb.dk
>>
>
>
>
> --
> Med venlig hilsen / Best regards
>
> *John Nielsen*
> Programmer
>
>
>
> *MCB A/S*
> Enghaven 15
> DK-7500 Holstebro
>
> Kundeservice: +45 9610 2824
> p...@mcb.dk
> www.mcb.dk

Re: Solr 3.5 Optimization takes index file size almost double

2013-06-16 Thread Erick Erickson

Unix or Windows? And are the files
still there after restarting Solr?

Best
Erick

On Fri, Jun 14, 2013 at 10:54 AM, Pravin Bhutada
 wrote:
> One thing that you can try is optimize incrementally. Instead of optimizing
> to 1 segment, optimize to 100, then 50 , 25, 10 ,5 ,2 ,1
> After each step, the index size should go down. This way you dont have to
> wait 7 hours to get some results.
>
>
> Pravin
>
>
> On Fri, Jun 14, 2013 at 10:45 AM, Viresh Modi <
> viresh.m...@highqsolutions.com> wrote:
>
>> Hi pravin
>>
>> I have nearly 2 TB Disk space for optimization.And  after optimization get
>> response of Qtime nearly 7hours (Obvious which  in milisecond).So i think
>> not issue of disk space.
>>
>>
>> Thanks&  Regards,
>> Viresh modi
>> Mobile: 91 (0) 9714567430
>>
>>
>> On 14 June 2013 20:10, Pravin Bhutada  wrote:
>>
>> > Hi Viresh,
>> >
>> > How much free disc space do you have?  if you have dont have enough space
>> > on disc, optimization process stops and rollsback to some intermediate
>> > state.
>> >
>> >
>> > Pravin
>> >
>> >
>> >
>> >
>> > On Fri, Jun 14, 2013 at 2:50 AM, Viresh Modi <
>> > viresh.m...@highqsolutions.com
>> > > wrote:
>> >
>> > > Hi Rafal
>> > >
>> > > Here i attached solr index file snapshot as well ..
>> > > So can you look into this and any another information required
>> regarding
>> > > it then let me know.
>> > >
>> > >
>> > > Thanks&  Regards,
>> > > Viresh modi
>> > > Mobile: 91 (0) 9714567430
>> > >
>> > >
>> > > On 13 June 2013 17:41, Rafał Kuć  wrote:
>> > >
>> > >> Hello!
>> > >>
>> > >> Do you have some backup after commit in your configuration? It would
>> > >> also be good to see how your index directory looks like, can you list
>> > >> that ?
>> > >>
>> > >> --
>> > >> Regards,
>> > >>  Rafał Kuć
>> > >>  Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch
>> > >>
>> > >> > Thanks Rafal for reply...
>> > >>
>> > >> > I agree with you. But Actually After optimization , it does not
>> reduce
>> > >> size
>> > >> > and it remains double. so is there any thing we missed or need to do
>> > for
>> > >> > achieving index size reduction ?
>> > >>
>> > >> > Is there any special setting we need to configure for replication?
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> > On 13 June 2013 16:53, Rafał Kuć  wrote:
>> > >>
>> > >> >> Hello!
>> > >> >>
>> > >> >> Optimize command needs to rewrite the segments, so while it is
>> > >> >> still working you may see the index size to be doubled. However
>> after
>> > >> >> it is finished the index size will be usually lowered comparing to
>> > the
>> > >> >> index size before optimize.
>> > >> >>
>> > >> >> --
>> > >> >> Regards,
>> > >> >>  Rafał Kuć
>> > >> >>  Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch
>> > >> >>
>> > >> >> > Hi,
>> > >> >> > I have solr server 1.4.1 with index file size 428GB.Now When I
>> > >> upgrade
>> > >> >> solr
>> > >> >> > Server 1.4.1 to Solr 3.5.0 by replication method. Size remains
>> > same.
>> > >> >> > But when optimize index for Solr 3.5.0 instance its size reaches
>> > >> 791GB.so
>> > >> >> > what is solutions for size remains same or lesser.
>> > >> >> > I optimize Solr 3.5 with Query:
>> > >> >> > /update?optimize=true&commit=true
>> > >> >>
>> > >> >> > Thanks & regards
>> > >> >> > Viresh Modi
>> > >> >>
>> > >> >>
>> > >>
>> > >>
>> > >
>> > > --
>> > > This email and its attachments are intended for the above named only
>> and
>> > > may be confidential. If they have come to you in error you must take no
>> > > action based on them, nor must you copy or show them to anyone; please
>> > > reply to this email and highlight the error.
>> > >
>> >
>>
>> --
>>
>> --
>> This email and its attachments are intended for the above named only and
>> may be confidential. If they have come to you in error you must take no
>> action based on them, nor must you copy or show them to anyone; please
>> reply to this email and highlight the error.
>>

Re: Replicas and soft commit

2013-06-16 Thread Erick Erickson

You're mixing things up pretty thoroughly 

SolrCloud with leaders and replicas is orthogonal
to Master/Slave setups, generally people use one
or the other. Master/Slave setups don't get NRT
updates at all. I'm a little surprised that your
setup works, it sounds like you have replication
set up but then send update individually to the slaves,
then rely on updating to the master to overwrite
the full index on the slave? If this is what you're
doing you should stop it . It's _really_
dangerous to send updates to slaves in a master/slave
situation, at some point the slave will think it's
ahead of the master and will stop replicating and
be out of sync.

So, you really have to choose one or the other and
go with that. If you require NRT, set up SolrCloud
and don't worry about it. All updates go to all nodes
and they all get the soft commit...

Best
Erick

On Fri, Jun 14, 2013 at 11:11 AM, Giovanni Bricconi
 wrote:
> I have recently upgraded our application from solr 3.6 to solr 4.2.1, and I
> have just started learning about soft commits and partial updates.
>
> Currently I have one indexing node and 3 replicas of the same core, and
> every modification goes through a dih delta index. This is usually ok but I
> have some special cases where updates should be made visible very quickly.
>
> As I have seen with my first tests - it is possible to send partial updates
> and soft commits to each replica and to the indexer - and when the indexer
> gets an hard commit every replica is realligned.
>
> Is this the right approach or am I misunderstanding how to use this
> feature?
>
> I don't see soft commit propagation to replicas when sending update to the
> indexer only: is this true or maybe I haven't changed some configuration
> files when porting the application to solr4?
>
> Giovanni

Re: Solr cloud: zkHost in solr.xml gets wiped out

2013-06-16 Thread Erick Erickson

Al:

As it happens, I hope sometime today to put up a patch for SOLR-4910
that should harden up many things in persisting solr.xml, I'll be sure
to include this. It's kind of a pain to create an automated test for
this, so I'll give it a whirl manually.

As you say, most of this is going away in 5.0, but it needs to work for 4.x.

And when I get the patch up, if you could give it a "real world" try
it'd be great!

Thanks,
Erick

On Fri, Jun 14, 2013 at 6:15 PM, Al Wold  wrote:
> Hi,
> I'm working on setting up a solr cloud test environment, and the target 
> environment I need to put it in has multiple webapps per tomcat instance. 
> With that in mind, I wanted/had to avoid putting any configs in system 
> properties. I tried putting the zkHost in solr.xml, like this:
>
>> 
>> 
>>   
>>   > hostContext="/"/>
>> 
>
> Everything works fine when I first start things up, create collections, 
> upload docs, search, etc. Creating the collection, however, modifies the 
> solr.xml file, and doesn't keep the zkHost setting:
>
>> 
>> 
>>   > hostContext="/">
>> > instanceDir="directory_shard2_replica1/" transient="false" 
>> name="directory_shard2_replica1" collection="directory"/>
>> > instanceDir="directory_shard1_replica1/" transient="false" 
>> name="directory_shard1_replica1" collection="directory"/>
>>   
>> 
>
>
> With that in mind, once I restart tomcat, it no longer knows it's supposed to 
> be talking to zookeeper, so it looks for local configs and blows up.
>
> I traced this back to the code in CoreContainer.java, in the method 
> persistFile(), where it seems to contain no code to write out the zkHost when 
> it updates solr.xml. I upped the logging on my solr instance to verify this 
> code is executing, so I'm pretty sure it's the right spot.
>
> Is anyone else using zkHost in their solr.xml successfully? I can't see how 
> it would work given this problem.
>
> Does this seem like a bug? If so, I can probably file a report and submit a 
> patch. It seems like this problem may become a non-issue in 5.0, based on 
> comments in the code and some of the discussion in JIRA, but I'm not sure how 
> far off that is.
>
> Thanks!
>
> -Al Wold
>

Re: New operator.

2013-06-16 Thread Jack Krupansky


It all depends on what you mean by an "operator".

Start by describing in more detail what problem you are trying to solve.

And how do you expect your users or applications to use this "operator". 
Give some examples.


Solr and Lucene do not have "operators" per say, except in query parser 
syntax, but that is hard-wired into the individual query parsers.


-- Jack Krupansky

-Original Message- 
From: Yanis Kakamaikis

Sent: Sunday, June 16, 2013 2:01 AM
To: solr-user@lucene.apache.org
Subject: New operator.

Hi all,I want to add a new operator to my solr.   I need that operator
to call my proprietary engine and build an answer vector to solr, in a way
that this vector will be part of the boolean query at the next step.   How
do I do that?
Thanks

Re: Solr large boolean filter

2013-06-16 Thread Jack Krupansky

Whenever I see one of this "big" query filters, my first thought is that 
there is something wrong with the application data model.


Where do the long list of IDs come from? Somebody must be generating and/or 
storing them, right? Why not store them in Solr, right in the data model?


Maybe store them as a separate collection and do a join operation.

How unique are they? I mean, a large number of large filters is not going to 
be very efficient.


In any case, start by looking at your data model.

-- Jack Krupansky

-Original Message- 
From: Igor Kustov

Sent: Saturday, June 15, 2013 6:49 PM
To: solr-user@lucene.apache.org
Subject: Solr large boolean filter

I know i'm not the first one with this problem.

I'm currently using solr 4.2.1 with approximately 10 mln documents in the
index.

The index is updated frequently.

The filter_query is just a one big boolean or query by id.

fq=id:(1 2 3 4 ... 50950)

ids list is always different and not sequential.

The problem is that query performance not so well, as you can imagine.

In some particular cases i'm able to do filtering based on different fields,
but in some cases (like 30-40% of all queries) i'm still end up with this
large id filter.

I'm looking for the ways to improve this query performance.

It doesn't seem like solr join could be applied there.

Another option that I found is to somehow use Lucene FieldCacheTermsFilter.
Does it worth a try?

Maybe i've missed some other options?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-large-boolean-filter-tp4070747.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr using a ridiculous amount of memory

2013-06-16 Thread adityab

It was interesting to read this post. I had similar issue on Solr v4.2.1. The
nature of our document is that it has huge multiValued fields and we were
able to knock off out server in about 30muns 
We then found a bug "Lucene-4995" which was causing all the problem.
Applying the patch has helped a lot. 
Not sure related but you might want to check that out. 
Thanks. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-using-a-ridiculous-amount-of-memory-tp4050840p4070803.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Best way to match umlauts

2013-06-16 Thread adityab

Thanks for the explanation Steve. I now see it clearly. In my case it should
work. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-match-umlauts-tp4070256p4070805.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr using a ridiculous amount of memory

2013-06-16 Thread Jack Krupansky

Yeah, this is yet another "anti-pattern" we need to be discouraging - large 
multivalued fields. They indicate that the data model is not well balanced 
and aligned with the strengths of Solr and Lucene.


-- Jack Krupansky

-Original Message- 
From: adityab

Sent: Sunday, June 16, 2013 9:36 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr using a ridiculous amount of memory

It was interesting to read this post. I had similar issue on Solr v4.2.1. 
The

nature of our document is that it has huge multiValued fields and we were
able to knock off out server in about 30muns
We then found a bug "Lucene-4995" which was causing all the problem.
Applying the patch has helped a lot.
Not sure related but you might want to check that out.
Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-using-a-ridiculous-amount-of-memory-tp4050840p4070803.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: in Solr 3.5, optimization increase the index size to double

2013-06-16 Thread Jason Hellman

And let's not forget the interesting bug in MMapDirectory:

http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/store/MMapDirectory.html

"NOTE: memory mapping uses up a portion of the virtual memory address space in 
your process equal to the size of the file being mapped. Before using this 
class, be sure your have plenty of virtual address space, e.g. by using a 64 
bit JRE, or a 32 bit JRE with indexes that are guaranteed to fit within the 
address space. On 32 bit platforms also consult setMaxChunkSize(int) if you 
have problems with mmap failing because of fragmented address space. If you get 
an OutOfMemoryException, it is recommended to reduce the chunk size, until it 
works.
Due to this bug in Sun's JRE, MMapDirectory's IndexInput.close() is unable to 
close the underlying OS file handle. Only when GC finally collects the 
underlying objects, which could be quite some time later, will the file handle 
be closed.

This will consume additional transient disk usage: on Windows, attempts to 
delete or overwrite the files will result in an exception; on other platforms, 
which typically have a "delete on last close" semantics, while such operations 
will succeed, the bytes are still consuming space on disk. For many 
applications this limitation is not a problem (e.g. if you have plenty of disk 
space, and you don't rely on overwriting files on Windows) but it's still an 
important limitation to be aware of."

If you're measuring by directory size (and not explicitly by the viewable 
files) you may very well be seeing this.

Jason

On Jun 16, 2013, at 4:53 AM, Erick Erickson  wrote:

> Optimzing will _temporarily_ double the index size,
> but it shouldn't be permanent. Is it possible that
> you have inadvertently told Solr to keep an extra
> snapshot? I think it's "numberToKeep" in your
> replication handler, but I'm going from memory here.
> 
> Best
> Erick
> 
> 
> On Fri, Jun 14, 2013 at 2:15 AM, Montu v Boda
>  wrote:
>> Hi,
>> 
>> i have replicate my index from 1.4 to 3.5 and after replication i try
>> optimize the index in 3.5 with below URL.
>> http://localhost:9002/solr35/collection1/update?optimize=true&commit=true
>> 
>> when i optimize the index in 3.5, it's increase the index size to double.
>> 
>> in 1.4 the size of index is 428GB and after optimization in 3.5 it becomes
>> 791 GB.
>> 
>> Thanks & Rrgards
>> Montu v Boda
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/in-Solr-3-5-optimization-increase-the-index-size-to-double-tp4070433.html
>> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Adding pdf/word file using JSON/XML

2013-06-16 Thread Jan Høydahl

Hi,

I've never heard the complaint that Solr is hard to use. To the contrary, most 
people I come across have downloaded Solr themselves, walked through the 
tutorial and praise the simplicity with which they can start indexing and 
searching content.

When they come to us asking for consultancy or training, they are already in 
love with the product, they use it but realize that great search is so much 
more than just getting the HTTP requests or XML right. So while any "average 
Java developer" will be able to download and use Solr within an hour or two (my 
statement - even PHP developers can do that :-) ), that's just the beginning of 
it all.

With your reasoning, all software for which training classes exist are bad and 
hard to use. Our training classes do not focus on the technology itself, but 
best practices to achieve good search user experience *using* Solr. This is a 
skill not even seasoned SQL developers have.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

15. juni 2013 kl. 21:39 skrev Jack Krupansky :

> [My apologies to Roland for "hijacking" his original thread for this rant! 
> Look what you started!!]
> 
> And I will stand by my statement: "Solr is too much of a beast for average 
> app developers to master."
> 
> And the key word there, in case a too-casual reader missed it is "master" - 
> not "use" in the sense of hack something together or solving a niche 
> application for a typical Solr deployment, but master in the sense of having 
> a high level of confidence about the vast bulk (even if not absolutely 100%) 
> of the subject matter, Solr itself.
> 
> I mean, generally, on average what percentage of Solr's many features  has 
> the average Solr app-deployer actually "mastered"?
> 
> And, what I am really referring to is not what expertise the pioneers and 
> "expert" Solr solution consultants have had, but the level of expertise 
> required for those who are to come in the years ahead who simply want to 
> focus on their application without needing to become a "Solr expert" first.
> 
> The context of my statement was the application "devs" referenced earlier in 
> this thread who were struggling because the Solr API was not 100% pure 
> RESTful. As the respondent indicated, they were much happier to have a 
> cleaner, more RESTful API that they as app developers can deal with, so that 
> they wouldn't have to "master" all of the bizarre inconsistencies of Solr 
> itself (e.g., just the knowledge that SolrCell doesn't support partial/atomic 
> update.)
> 
> And, the real focus of my statement, again in this particular context" is the 
> actual application devs, the guys focused on the actual application subject 
> matter itself, not the "Solr Experts" or "Solr solution architects" who do 
> have a lot higher mastery of Solr than the "average" application devs.
> 
> And if my statement were in fact false, questions such as began this thread 
> would never have come up. The level of traffic for Solr User would be 
> essentially zero if it were really true that average application developers 
> can easily "master" Solr.
> 
> And there would be zero need so many of these Solr training classes if Solr 
> were so easy to "master". In fact, the very existence of so many Solr 
> training classes effectively proves my point. And that's just for "basic" 
> Solr, not any of the many esoteric points such as at the heart of this 
> particular thread (i.e., SolrCell not supporting partial/atomic update.)
> 
> And, in conclusion, my real interest is in helping the many "average" 
> application developers who post inquiries on this Solr user list for the 
> simple reason that they ARE in fact "struggling" with Solr.
> 
> Personally, I would suggest that a typical (average) successful deployer of 
> Solr would be more readily characterized as having "survived" the Solr 
> deployment process rather than having achieved a truly deep "mastery" of 
> Solr. They may have achieved confidence about exactly what they have 
> deployed, but do they also have great confidence that they know exactly what 
> will happen if they make slight and subtle changes or what exactly the fix 
> will be for certain runtime errors? For the "average application developer" 
> I'm talking about, not the elite expert Solr consultants.
> 
> One final way of putting it. If a manager or project leader wanted to staff a 
> dev position to be "in-house Solr expert", can they just hire any old average 
> Java programmer with no Solr experience and expect that he will rapidly 
> "master" Solr?
> 
> I mean, why would so many recruiters be looking for a "Solr expert" or 
> engaging the services of Solr sonsultancies if mastery of Solr by "average 
> application developers" was a reality?!
> 
> [I want to hear Otis' take on this!]
> 
> -- Jack Krupansky
> 
> -Original Message- From: Grant Ingersoll
> Sent: Saturday, June 15, 2013 1:47 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Adding pdf

RE: filter query from external list of Solr unique IDs

2013-06-16 Thread samabhiK

Does anything exists already in solr 4.3 to meet this usecase scenario?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/filter-query-from-external-list-of-Solr-unique-IDs-tp1709060p4070874.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Adding pdf/word file using JSON/XML

2013-06-16 Thread Jack Krupansky

Jan, you made no mention of "mastering" Solr - which was the crux of my 
comments.


I think everyone agrees that anyone can download and "use" Solr, in a basic 
sense, with minimal effort. The issue is how far the average application 
developer can get beyond "start" towards "mastery" without a detailed cheat 
sheet and eventually intensive guidance, if not outright exasperation and 
pain. How many of the many thousands of Solr deployments didn't hit some 
kind of wall where they had the impression that Solr should be able to do 
something easily and found that was not the case (multi-word synonyms come 
to mind.)


Oh, and yes, by my standards, MOST software IS "bad" and "hard to use". The 
level of training and books is certainly an indicator of the level of 
"badness". Some of Solr is indeed "not so bad" - while other parts are have 
at least some elements of "extreme badness" (NPE for a missing or invalid 
parameters is a mark of extreme badness.)


[Again, my apologies to Roland - none of these comments reflect on his 
original inquiry! Except, that Solr's divergence from a true, pure REST API 
is certainly one of the elements of its "badness". The fact that SolrCell 
does not support partial update as a true REST CRUD API should, is a good 
example of relative "badness" in Solr.]


-- Jack Krupansky

-Original Message- 
From: Jan Høydahl

Sent: Sunday, June 16, 2013 4:16 PM
To: solr-user@lucene.apache.org
Subject: Re: Adding pdf/word file using JSON/XML

Hi,

I've never heard the complaint that Solr is hard to use. To the contrary, 
most people I come across have downloaded Solr themselves, walked through 
the tutorial and praise the simplicity with which they can start indexing 
and searching content.


When they come to us asking for consultancy or training, they are already in 
love with the product, they use it but realize that great search is so much 
more than just getting the HTTP requests or XML right. So while any "average 
Java developer" will be able to download and use Solr within an hour or two 
(my statement - even PHP developers can do that :-) ), that's just the 
beginning of it all.


With your reasoning, all software for which training classes exist are bad 
and hard to use. Our training classes do not focus on the technology itself, 
but best practices to achieve good search user experience *using* Solr. This 
is a skill not even seasoned SQL developers have.


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

15. juni 2013 kl. 21:39 skrev Jack Krupansky :

[My apologies to Roland for "hijacking" his original thread for this rant! 
Look what you started!!]


And I will stand by my statement: "Solr is too much of a beast for average 
app developers to master."


And the key word there, in case a too-casual reader missed it is 
"master" - not "use" in the sense of hack something together or solving a 
niche application for a typical Solr deployment, but master in the sense 
of having a high level of confidence about the vast bulk (even if not 
absolutely 100%) of the subject matter, Solr itself.


I mean, generally, on average what percentage of Solr's many features  has 
the average Solr app-deployer actually "mastered"?


And, what I am really referring to is not what expertise the pioneers and 
"expert" Solr solution consultants have had, but the level of expertise 
required for those who are to come in the years ahead who simply want to 
focus on their application without needing to become a "Solr expert" 
first.


The context of my statement was the application "devs" referenced earlier 
in this thread who were struggling because the Solr API was not 100% pure 
RESTful. As the respondent indicated, they were much happier to have a 
cleaner, more RESTful API that they as app developers can deal with, so 
that they wouldn't have to "master" all of the bizarre inconsistencies of 
Solr itself (e.g., just the knowledge that SolrCell doesn't support 
partial/atomic update.)


And, the real focus of my statement, again in this particular context" is 
the actual application devs, the guys focused on the actual application 
subject matter itself, not the "Solr Experts" or "Solr solution 
architects" who do have a lot higher mastery of Solr than the "average" 
application devs.


And if my statement were in fact false, questions such as began this 
thread would never have come up. The level of traffic for Solr User would 
be essentially zero if it were really true that average application 
developers can easily "master" Solr.


And there would be zero need so many of these Solr training classes if 
Solr were so easy to "master". In fact, the very existence of so many Solr 
training classes effectively proves my point. And that's just for "basic" 
Solr, not any of the many esoteric points such as at the heart of this 
particular thread (i.e., SolrCell not supporting partial/atomic update.)


And, in conclusion, my real interest is in helping the many "ave

Re: Adding pdf/word file using JSON/XML

2013-06-16 Thread Yonik Seeley

On Sun, Jun 16, 2013 at 6:05 PM, Jack Krupansky  wrote:
> Except, that Solr's divergence from a true, pure REST API is certainly one
> of the elements of its "badness".

Most complex systems seem to feel the need to diverge from pure REST
for the sake of being practical.
>From that perspective "pure REST" can be a disadvantage.  We need to
be more specific about the advantages/disadvantages of specific points
in an API.

-Yonik
http://lucidworks.com

Re: Adding pdf/word file using JSON/XML

2013-06-16 Thread Jack Krupansky

Exactly. For the case in point that is the real, underlying subject of this 
thread, the desire is to partially update an existing Solr document using 
the output of SolrCell/Tika.


With true/pure REST, that should be the HTTP "PUT" verb. And the path would 
indicate the collection and key value. Maybe something like:


PUT 
http://localhost:8983/solr/collections/example/id/doc-23?format=SolrCell...


And to delete a doc:

DELETE http://localhost:8983/solr/collections/example/id/doc-23

And to add or completely replace a single doc:

POST 
http://localhost:8983/solr/collections/example/id/doc-23?format=SolrCell...

or
POST 
http://localhost:8983/solr/collections/example?format=SolrCell&id=doc-23


Not that I'm seriously suggesting that Solr should be changed in this way at 
this stage. I'm simply indicating one place where a closer-to-pure REST API 
would have made it more clear what the application developer's intentions 
were - and maybe have completely avoided this entire thread.


-- Jack Krupansky

-Original Message- 
From: Yonik Seeley

Sent: Sunday, June 16, 2013 6:41 PM
To: solr-user@lucene.apache.org
Subject: Re: Adding pdf/word file using JSON/XML

On Sun, Jun 16, 2013 at 6:05 PM, Jack Krupansky  
wrote:

Except, that Solr's divergence from a true, pure REST API is certainly one
of the elements of its "badness".


Most complex systems seem to feel the need to diverge from pure REST
for the sake of being practical.

From that perspective "pure REST" can be a disadvantage.  We need to

be more specific about the advantages/disadvantages of specific points
in an API.

-Yonik
http://lucidworks.com

Re: Adding pdf/word file using JSON/XML

2013-06-16 Thread Yago Riveiro

I'm share the yonik's opinion that a pure REST application is in some cases is 
a pain in the ass.

But like Jack referred, exists some cases where REST is more expressive and is 
easy to understand what are you doing.

At this point, I think that is more important make the actual API more stable 
and functional, but as an underground task, we need think about how transform 
it in a (almost) REST API. 

Regards.

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Sunday, June 16, 2013 at 11:54 PM, Jack Krupansky wrote:

> Exactly. For the case in point that is the real, underlying subject of this 
> thread, the desire is to partially update an existing Solr document using 
> the output of SolrCell/Tika.
> 
> With true/pure REST, that should be the HTTP "PUT" verb. And the path would 
> indicate the collection and key value. Maybe something like:
> 
> PUT 
> http://localhost:8983/solr/collections/example/id/doc-23?format=SolrCell...
> 
> And to delete a doc:
> 
> DELETE http://localhost:8983/solr/collections/example/id/doc-23
> 
> And to add or completely replace a single doc:
> 
> POST 
> http://localhost:8983/solr/collections/example/id/doc-23?format=SolrCell...
> or
> POST 
> http://localhost:8983/solr/collections/example?format=SolrCell&id=doc-23
> 
> Not that I'm seriously suggesting that Solr should be changed in this way at 
> this stage. I'm simply indicating one place where a closer-to-pure REST API 
> would have made it more clear what the application developer's intentions 
> were - and maybe have completely avoided this entire thread.
> 
> -- Jack Krupansky
> 
> -Original Message- 
> From: Yonik Seeley
> Sent: Sunday, June 16, 2013 6:41 PM
> To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org)
> Subject: Re: Adding pdf/word file using JSON/XML
> 
> On Sun, Jun 16, 2013 at 6:05 PM, Jack Krupansky  (mailto:j...@basetechnology.com)> 
> wrote:
> > Except, that Solr's divergence from a true, pure REST API is certainly one
> > of the elements of its "badness".
> > 
> 
> 
> Most complex systems seem to feel the need to diverge from pure REST
> for the sake of being practical.
> From that perspective "pure REST" can be a disadvantage. We need to
> be more specific about the advantages/disadvantages of specific points
> in an API.
> 
> -Yonik
> http://lucidworks.com 
> 
>

Re: Adding pdf/word file using JSON/XML

2013-06-16 Thread Walter Underwood

1. Total mastery of a product is a strange requirement. That would would be a 
huge trivia contest that would include all the vestigial bad bits. For example, 
I feel no need to master the Porter stemmer. I have no idea how to do geo 
search in Solr, though I'm sure I could learn it pretty quickly if needed.

2. Someone who expects partial update in a search engine, or transactions, has 
a deep misunderstandings of the tradeoffs you make for what search can do. That 
isn't mastery of arcane details, that is search 101.

Here are Rob Pike's rules for a good software architecture:

1. Simple things are simple.
2. Hard things are possible.
3. You don't need to understand the entire system to use part of it.

I think Solr comes pretty close to that. It doesn't do as well on #1 as 
Ultraseek did, but it is better on #2.

If you really need search with transactions with field updates, that is really 
hard. You can buy it from Mark Logic. It works great and they charge what it is 
worth.

wunder
Former Principle Architect Infoseek/Inktomi/Verity Ultraseek
Former Search Guy Netflix
Search Guy Chegg

On Jun 16, 2013, at 3:05 PM, Jack Krupansky wrote:

> Jan, you made no mention of "mastering" Solr - which was the crux of my 
> comments.
> 
> I think everyone agrees that anyone can download and "use" Solr, in a basic 
> sense, with minimal effort. The issue is how far the average application 
> developer can get beyond "start" towards "mastery" without a detailed cheat 
> sheet and eventually intensive guidance, if not outright exasperation and 
> pain. How many of the many thousands of Solr deployments didn't hit some kind 
> of wall where they had the impression that Solr should be able to do 
> something easily and found that was not the case (multi-word synonyms come to 
> mind.)
> 
> Oh, and yes, by my standards, MOST software IS "bad" and "hard to use". The 
> level of training and books is certainly an indicator of the level of 
> "badness". Some of Solr is indeed "not so bad" - while other parts are have 
> at least some elements of "extreme badness" (NPE for a missing or invalid 
> parameters is a mark of extreme badness.)
> 
> [Again, my apologies to Roland - none of these comments reflect on his 
> original inquiry! Except, that Solr's divergence from a true, pure REST API 
> is certainly one of the elements of its "badness". The fact that SolrCell 
> does not support partial update as a true REST CRUD API should, is a good 
> example of relative "badness" in Solr.]
> 
> -- Jack Krupansky
> 
> -Original Message- From: Jan Høydahl
> Sent: Sunday, June 16, 2013 4:16 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Adding pdf/word file using JSON/XML
> 
> Hi,
> 
> I've never heard the complaint that Solr is hard to use. To the contrary, 
> most people I come across have downloaded Solr themselves, walked through the 
> tutorial and praise the simplicity with which they can start indexing and 
> searching content.
> 
> When they come to us asking for consultancy or training, they are already in 
> love with the product, they use it but realize that great search is so much 
> more than just getting the HTTP requests or XML right. So while any "average 
> Java developer" will be able to download and use Solr within an hour or two 
> (my statement - even PHP developers can do that :-) ), that's just the 
> beginning of it all.
> 
> With your reasoning, all software for which training classes exist are bad 
> and hard to use. Our training classes do not focus on the technology itself, 
> but best practices to achieve good search user experience *using* Solr. This 
> is a skill not even seasoned SQL developers have.
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
> 15. juni 2013 kl. 21:39 skrev Jack Krupansky :
> 
>> [My apologies to Roland for "hijacking" his original thread for this rant! 
>> Look what you started!!]
>> 
>> And I will stand by my statement: "Solr is too much of a beast for average 
>> app developers to master."
>> 
>> And the key word there, in case a too-casual reader missed it is "master" - 
>> not "use" in the sense of hack something together or solving a niche 
>> application for a typical Solr deployment, but master in the sense of having 
>> a high level of confidence about the vast bulk (even if not absolutely 100%) 
>> of the subject matter, Solr itself.
>> 
>> I mean, generally, on average what percentage of Solr's many features  has 
>> the average Solr app-deployer actually "mastered"?
>> 
>> And, what I am really referring to is not what expertise the pioneers and 
>> "expert" Solr solution consultants have had, but the level of expertise 
>> required for those who are to come in the years ahead who simply want to 
>> focus on their application without needing to become a "Solr expert" first.
>> 
>> The context of my statement was the application "devs" referenced earlier in 
>> this thread who were struggling because the Solr API was

Re: Adding pdf/word file using JSON/XML

2013-06-16 Thread Alexandre Rafalovitch

On Sun, Jun 16, 2013 at 7:27 PM, Walter Underwood  wrote:
> 2. Someone who expects partial update in a search engine, or transactions, 
> has a deep misunderstandings of the tradeoffs you make for what search can 
> do. That isn't mastery of arcane details, that is search 101.

Yes, they might (have misunderstanding). But between coming from SQL
'like' and hearing Solr being called NoSQL, you would not be surprised
if that happened. I, for one, don't remember seeing a 'search 101'
document that outlines those assumptions. And without it, how would
one be expected to know those, except through some sort of mastery of
_any_ search engine? And, if Solr is the _first_ such search engine,
we just got ourselves back into catch-22 situation.

So, I am with Jack on this one.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)

Re: Adding pdf/word file using JSON/XML

2013-06-16 Thread Otis Gospodnetic

Serious thread hiJacking here

Hey, why was I singled out? ;)

I don't have time to get deep into this (there are non-experts I need
to help! kidding...) , but I'll say this:
* Do you know any non-trivial piece of software in which an average
developer is a master?  I've managed to master the `man' command!
* How about life?  Have the average people mastered it?  I know I'm
far from it.  How about you? :)  Solr is larger than life, so why
would it be any different?
* Jack, once your book is out, maybe the road to Solr mastery will be
just a Solr Bible away!
* Patches welcome :)

Joking aside, Solr is non-trivial, but I think an average can get.
It's not rocket science.

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/  <== experts inside ;)





On Sat, Jun 15, 2013 at 3:39 PM, Jack Krupansky  wrote:
> [My apologies to Roland for "hijacking" his original thread for this rant!
> Look what you started!!]
>
> And I will stand by my statement: "Solr is too much of a beast for average
> app developers to master."
>
> And the key word there, in case a too-casual reader missed it is "master" -
> not "use" in the sense of hack something together or solving a niche
> application for a typical Solr deployment, but master in the sense of having
> a high level of confidence about the vast bulk (even if not absolutely 100%)
> of the subject matter, Solr itself.
>
> I mean, generally, on average what percentage of Solr's many features  has
> the average Solr app-deployer actually "mastered"?
>
> And, what I am really referring to is not what expertise the pioneers and
> "expert" Solr solution consultants have had, but the level of expertise
> required for those who are to come in the years ahead who simply want to
> focus on their application without needing to become a "Solr expert" first.
>
> The context of my statement was the application "devs" referenced earlier in
> this thread who were struggling because the Solr API was not 100% pure
> RESTful. As the respondent indicated, they were much happier to have a
> cleaner, more RESTful API that they as app developers can deal with, so that
> they wouldn't have to "master" all of the bizarre inconsistencies of Solr
> itself (e.g., just the knowledge that SolrCell doesn't support
> partial/atomic update.)
>
> And, the real focus of my statement, again in this particular context" is
> the actual application devs, the guys focused on the actual application
> subject matter itself, not the "Solr Experts" or "Solr solution architects"
> who do have a lot higher mastery of Solr than the "average" application
> devs.
>
> And if my statement were in fact false, questions such as began this thread
> would never have come up. The level of traffic for Solr User would be
> essentially zero if it were really true that average application developers
> can easily "master" Solr.
>
> And there would be zero need so many of these Solr training classes if Solr
> were so easy to "master". In fact, the very existence of so many Solr
> training classes effectively proves my point. And that's just for "basic"
> Solr, not any of the many esoteric points such as at the heart of this
> particular thread (i.e., SolrCell not supporting partial/atomic update.)
>
> And, in conclusion, my real interest is in helping the many "average"
> application developers who post inquiries on this Solr user list for the
> simple reason that they ARE in fact "struggling" with Solr.
>
> Personally, I would suggest that a typical (average) successful deployer of
> Solr would be more readily characterized as having "survived" the Solr
> deployment process rather than having achieved a truly deep "mastery" of
> Solr. They may have achieved confidence about exactly what they have
> deployed, but do they also have great confidence that they know exactly what
> will happen if they make slight and subtle changes or what exactly the fix
> will be for certain runtime errors? For the "average application developer"
> I'm talking about, not the elite expert Solr consultants.
>
> One final way of putting it. If a manager or project leader wanted to staff
> a dev position to be "in-house Solr expert", can they just hire any old
> average Java programmer with no Solr experience and expect that he will
> rapidly "master" Solr?
>
> I mean, why would so many recruiters be looking for a "Solr expert" or
> engaging the services of Solr sonsultancies if mastery of Solr by "average
> application developers" was a reality?!
>
> [I want to hear Otis' take on this!]
>
> -- Jack Krupansky
>
> -Original Message- From: Grant Ingersoll
> Sent: Saturday, June 15, 2013 1:47 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Adding pdf/word file using JSON/XML
>
>
>
> On Jun 15, 2013, at 12:54 PM, Alexandre Rafalovitch 
> wrote:
>
>> On Sat, Jun 15, 2013 at 10:35 AM, Grant Ingersoll 
>> wrote:
>>>
>>> That being said, it truly amazes me that people were ever able to
>>> implement Solr, given some of the FUD in t

Re: Adding pdf/word file using JSON/XML

2013-06-16 Thread Lance Norskog

No, they just learned a few features and then stopped because it was 
"good enough", and they had a thousand other things to code.


As to REST- yes, it is worth having a coherent API. Solr is behind the 
curve here. Look at the HATEOS paradigm. It's ornate (and a really goofy 
name) but it provides a lot of goodness- the API tells you how to use 
it. For example, a search page response includes a link for the next 
page; your UI finds the link and hangs it off a 'Next' button. Your UI 
does not need code for 'create a Next link'.


Also, don't do that /v1 crap. At this point we all know how it should work.

On 06/15/2013 07:35 AM, Grant Ingersoll wrote:

On Jun 13, 2013, at 11:24 AM, Walter Underwood  wrote:


That was my thought exactly. Contribute a REST request handler. --wunder


+1.  The bits are already in place for a lot of it now that RESTlet is in.

That being said, it truly amazes me that people were ever able to implement 
Solr, given some of the FUD in this thread.  I guess those tens of thousands of 
deployments out there were all done by above average devs...

-Grant

Re: Best way to match umlauts

2013-06-16 Thread Lance Norskog

One small thing: German u-umlaut is often "flattened" as 'ue' instead of 
'u'. And the same with o-umlaut, it can be 'oe' or 'o'. I don't know if 
Lucene has a good solution for this problem.


On 06/16/2013 06:44 AM, adityab wrote:

Thanks for the explanation Steve. I now see it clearly. In my case it should
work.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-match-umlauts-tp4070256p4070805.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Adding pdf/word file using JSON/XML

2013-06-16 Thread Jack Krupansky

I won't assert "total" mastery as a requirement. Degrees of mastery are 
sufficient. But even then, even "partial" mastery of some rather basic areas 
of Solor can be quite daunting.


It is enlightening to consider just how many nooks and crannies of Solr 
there are to master, and how many reasonable levels of mastery there are.


Spatial... the final frontier.

-- Jack Krupansky

-Original Message- 
From: Walter Underwood

Sent: Sunday, June 16, 2013 7:27 PM
To: solr-user@lucene.apache.org
Subject: Re: Adding pdf/word file using JSON/XML

1. Total mastery of a product is a strange requirement. That would would be 
a huge trivia contest that would include all the vestigial bad bits. For 
example, I feel no need to master the Porter stemmer. I have no idea how to 
do geo search in Solr, though I'm sure I could learn it pretty quickly if 
needed.


2. Someone who expects partial update in a search engine, or transactions, 
has a deep misunderstandings of the tradeoffs you make for what search can 
do. That isn't mastery of arcane details, that is search 101.


Here are Rob Pike's rules for a good software architecture:

1. Simple things are simple.
2. Hard things are possible.
3. You don't need to understand the entire system to use part of it.

I think Solr comes pretty close to that. It doesn't do as well on #1 as 
Ultraseek did, but it is better on #2.


If you really need search with transactions with field updates, that is 
really hard. You can buy it from Mark Logic. It works great and they charge 
what it is worth.


wunder
Former Principle Architect Infoseek/Inktomi/Verity Ultraseek
Former Search Guy Netflix
Search Guy Chegg

On Jun 16, 2013, at 3:05 PM, Jack Krupansky wrote:

Jan, you made no mention of "mastering" Solr - which was the crux of my 
comments.


I think everyone agrees that anyone can download and "use" Solr, in a 
basic sense, with minimal effort. The issue is how far the average 
application developer can get beyond "start" towards "mastery" without a 
detailed cheat sheet and eventually intensive guidance, if not outright 
exasperation and pain. How many of the many thousands of Solr deployments 
didn't hit some kind of wall where they had the impression that Solr 
should be able to do something easily and found that was not the case 
(multi-word synonyms come to mind.)


Oh, and yes, by my standards, MOST software IS "bad" and "hard to use". 
The level of training and books is certainly an indicator of the level of 
"badness". Some of Solr is indeed "not so bad" - while other parts are 
have at least some elements of "extreme badness" (NPE for a missing or 
invalid parameters is a mark of extreme badness.)


[Again, my apologies to Roland - none of these comments reflect on his 
original inquiry! Except, that Solr's divergence from a true, pure REST 
API is certainly one of the elements of its "badness". The fact that 
SolrCell does not support partial update as a true REST CRUD API should, 
is a good example of relative "badness" in Solr.]


-- Jack Krupansky

-Original Message- From: Jan Høydahl
Sent: Sunday, June 16, 2013 4:16 PM
To: solr-user@lucene.apache.org
Subject: Re: Adding pdf/word file using JSON/XML

Hi,

I've never heard the complaint that Solr is hard to use. To the contrary, 
most people I come across have downloaded Solr themselves, walked through 
the tutorial and praise the simplicity with which they can start indexing 
and searching content.


When they come to us asking for consultancy or training, they are already 
in love with the product, they use it but realize that great search is so 
much more than just getting the HTTP requests or XML right. So while any 
"average Java developer" will be able to download and use Solr within an 
hour or two (my statement - even PHP developers can do that :-) ), that's 
just the beginning of it all.


With your reasoning, all software for which training classes exist are bad 
and hard to use. Our training classes do not focus on the technology 
itself, but best practices to achieve good search user experience *using* 
Solr. This is a skill not even seasoned SQL developers have.


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

15. juni 2013 kl. 21:39 skrev Jack Krupansky :

[My apologies to Roland for "hijacking" his original thread for this 
rant! Look what you started!!]


And I will stand by my statement: "Solr is too much of a beast for 
average app developers to master."


And the key word there, in case a too-casual reader missed it is 
"master" - not "use" in the sense of hack something together or solving a 
niche application for a typical Solr deployment, but master in the sense 
of having a high level of confidence about the vast bulk (even if not 
absolutely 100%) of the subject matter, Solr itself.


I mean, generally, on average what percentage of Solr's many features 
has the average Solr app-deployer actually "mastered"?


And, what I am really referring to is not

sort=geodist() asc

2013-06-16 Thread William Bell

This simple feature of "sort=geodist() asc" is very powerful since it
enables us to move from SOLR 3 to SOLR 4 without rewriting all our queries.

We also use boost=geodist() in some cases, and some bf/bq.

bf=recip(geodist(),2,200,20)&sort=score
desc

OR

boost=recip(geodist(),2,200,20)&sort=score
desc

I know it specifically says it won't work for geohash multivalue points,
but what would it take to do it?




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076

SOLR 4.3.1?

2013-06-16 Thread William Bell

When is 4.3.1 coming out?

-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076

Re: SOLR 4.3.1?

2013-06-16 Thread Anshum Gupta

It's already cut and the vote has been passed. It should be out any time
now.


On Mon, Jun 17, 2013 at 11:26 AM, William Bell  wrote:

> When is 4.3.1 coming out?
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076
>



-- 

Anshum Gupta
http://www.anshumgupta.net

Re: New operator.

Re: Solr large boolean filter

Re: in Solr 3.5, optimization increase the index size to double

Re: Solr using a ridiculous amount of memory

Re: Solr 3.5 Optimization takes index file size almost double

Re: Replicas and soft commit

Re: Solr cloud: zkHost in solr.xml gets wiped out

Re: New operator.

Re: Solr large boolean filter

Re: Solr using a ridiculous amount of memory

Re: Best way to match umlauts

Re: Solr using a ridiculous amount of memory

Re: in Solr 3.5, optimization increase the index size to double

Re: Adding pdf/word file using JSON/XML

RE: filter query from external list of Solr unique IDs

Re: Adding pdf/word file using JSON/XML

Re: Adding pdf/word file using JSON/XML

Re: Adding pdf/word file using JSON/XML

Re: Adding pdf/word file using JSON/XML

Re: Adding pdf/word file using JSON/XML

Re: Adding pdf/word file using JSON/XML

Re: Adding pdf/word file using JSON/XML

Re: Adding pdf/word file using JSON/XML

Re: Best way to match umlauts

Re: Adding pdf/word file using JSON/XML

sort=geodist() asc

SOLR 4.3.1?

Re: SOLR 4.3.1?

28 matches

Site Navigation

Mail list logo

Footer information