Re: location of the files created by zookeeper?

2014-05-16 Thread Steve McKay
The config is stored in ZooKeeper. 
/configs/myconf/velocity/pagination_bottom.vm is a ZooKeeper path, not a 
filesystem path. The data on disk is in ZK's binary format. Solr uses the ZK 
client library to talk to the embedded server and read config data.

On May 16, 2014, at 2:47 AM, Aman Tandon  wrote:

> Any help here??
> 
> With Regards
> Aman Tandon
> 
> 
> On Thu, May 15, 2014 at 10:17 PM, Aman Tandon wrote:
> 
>> Hi,
>> 
>> Can anybody tell me where does the embedded zookeeper keeps your config
>> files.when we describe the configName in starting the solrcloud then it
>> gives that name to the directory, as guessed from the solr logs.
>> 
>> 
>> 
>> 
>> 
>> 
>> *4409 [main] INFO  org.apache.solr.common.cloud.SolrZkClient  – makePath:
>> /configs/myconf/velocity/footer.vm4456 [main] INFO
>> org.apache.solr.common.cloud.SolrZkClient  – makePath:
>> /configs/myconf/velocity/pagination_bottom.vm 4479 [main] INFO
>> org.apache.solr.common.cloud.SolrZkClient  – makePath:
>> /configs/myconf/velocity/head.vm4530 [main] INFO
>> org.apache.solr.common.cloud.SolrZkClient  – makePath:
>> /configs/myconf/velocity/pagination_top.vm 4555 [main] INFO
>> org.apache.solr.common.cloud.SolrZkClient  – makePath:
>> /configs/myconf/velocity/VM_global_library.vm4599 [main] INFO
>> org.apache.solr.common.cloud.SolrZkClient  – makePath:
>> /configs/myconf/velocity/suggest.vm*
>> 
>> 
>> With Regards
>> Aman Tandon
>> 



Re: Using embedded zookeeper to make an ensemble

2014-05-16 Thread Steve McKay
Doing this doesn't avoid the need to configure and administrate ZK. Running a 
special snowflake setup to avoid downloading a tar.gz doesn't seem like a good 
trade-off to me.

On May 15, 2014, at 3:27 PM, Upayavira  wrote:

> Hi,
> 
> I need to set up a zookeeper ensemble. I could download Zookeeper and do
> it that way. I already have everything I need to run Zookeeper within a
> Solr install.
> 
> Is it possible to run a three node zookeeper ensemble by starting up
> three Solr nodes with Zookeeper enabled? Obviously, I'd only use these
> nodes for their Zookeeper, and keep their indexes empty.
> 
> I've made some initial attempts, and whilst it looks like it might be
> possible with -DzkRun and -DzkHost=, I haven't yet succeeded.
> 
> I think this could be a much easier way for people familar with Solr to
> get an ensemble up compared to downloading the Zookeeper distribution.
> 
> Thoughts?
> 
> Upayavira



Re: SolrCloud Nodes autoSoftCommit and (temporary) missing documents

2014-05-25 Thread Steve McKay
Solr can add the filter for you:



timestamp:[* TO NOW-30SECOND]



Increasing soft commit frequency isn't a bad idea, though. I'd probably do 
both. :)

On May 23, 2014, at 6:51 PM, Michael Tracey  wrote:

> Hey all,
> 
> I've got a number of nodes (Solr 4.4 Cloud) that I'm balancing with HaProxy 
> for queries.  I'm indexing pretty much constantly, and have autoCommit and 
> autoSoftCommit on for Near Realtime Searching.  All works nicely, except that 
> occasionally the auto-commit cycles are far enough off that one node will 
> return a document that another node doesn't.  I don't want to have to add 
> something like this: timestamp:[* TO NOW-30MINUTE] to every query to make 
> sure that all the nodes have the record.  Ideas? autoSoftCommit more often?
> 
>  
>   10 
>   720 
>   false 
> 
> 
>  
>   3 
>   5000
>  
> 
> Thanks,
> 
> M.



Re: Does CloudSolrServer hit zookeeper for every request?

2014-06-02 Thread Steve McKay
ZooKeeper allows clients to put watches on paths in the ZK tree. When the 
cluster state changes, every Solr client is notified by the ZK server and then 
each client reads the updated state. No polling is needed or even helpful.

In any event, reading from ZK is much more lightweight than writing, because 
the ZK server keeps all its data in memory and doesn’t have to go through the 
consensus rigamarole required for a write.

On Jun 2, 2014, at 5:17 PM, Jim.Musil  wrote:

> I’m curious how CloudSolrServer works in practice.
> 
> I understand that it gets the active solr nodes from zookeeper, but does it 
> do this for every request?
> 
> If it does hit zk for every request, that seems to put a lot of pressure on 
> the zk ensemble.
> 
> If it does NOT hit zk for every request, then how does it detect changes in 
> the number of nodes and the status of the nodes?
> 
> Thanks!
> Jim M.



Re: Recommended ZooKeeper topology in Production

2014-06-10 Thread Steve McKay
Dedicated machines are a good idea. The main thing is to make sure that ZK 
always has IOPS available for transaction log writes. That's easy to ensure 
when each ZK instance has its own hardware. The standard practice, as far as I 
know, is to have 3 physical boxes spread among racks/datacenters/continents as 
HA needs dictate.

Sharing a machine between Solr and ZK is definitely not ideal. Instead of Solr 
machines and ZK machines, now you have Solr machines and Solr+ZK machines. It 
adds management overhead because now you have to take ZK into account while 
administering your Solr cluster, and unless you give ZK its own disk it will 
have to compete with Solr for I/O.

On Jun 10, 2014, at 2:58 AM, Gili Nachum  wrote:

> Is there a recommended ZooKeeper topology for production Solr environments?
> 
> I was planning: 3 ZK nodes, each on its own dedicated machine.
> 
> Thinking that dedicated machines, separate from Solr servers, would keep ZK
> isolated from resource contention spikes that may occur on Solr. Also, if a
> Solr machine goes down, there would still be 3 ZK nodes to handle the event
> properly.
> 
> If I want to save on resources, placing each ZK instance on the same box as
> Solr instance in considered common practice in production environments?
> 
> Thanks!



Re: CopyField can't copy analyzers and Filters

2014-06-30 Thread Steve McKay
Three fields: AllChamp_ar, AllChamp_fr, AllChamp_en. Then query them with 
dismax.

On Jun 30, 2014, at 11:53 AM, benjelloun  wrote:

> here is my schema: 
> 
>  required="false" stored="false"/>
>  required="false" multiValued="true"/>
> 
>  required="false" multiValued="true"/>
> 
>  required="false" multiValued="true"/>
> 
> 
> 
> 
> 
> when i index documents then search on this field "AllChamp" that don't do
> analyzer and filter.
> I know that CopyField can't copy analyzers and Filters, so how to keep
> analyzer and filter on Field: "AllChamp"?
> 
> Exemple: 
> 
> I search for : AllChamp:presenton  --> num result=0 
>   AllChamp:présenton  --> num result=1 
> 
> thanks for help, 
> best regards, 
> Anass BENJELLOUN 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/CopyField-can-t-copy-analyzers-and-Filters-tp4144803.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Tomcat or Jetty to use with solr in production ?

2014-06-30 Thread Steve McKay
Seconding this. Solr works fine on Jetty. Solr also works fine on Tomcat. The 
Solr community largely uses Jetty, so most of the resources on the Web are for 
running Solr on Jetty, but if you have a reason to use Tomcat and know what 
you're doing then Tomcat is a fine choice.

On Jun 30, 2014, at 11:58 AM, Erick Erickson  wrote:

> The only thing I would add is that if you _already_
> are a tomcat shop and have considerable
> expertise running Tomcat, it might just be easier
> to stick with what you know.
> 
> But if you have a choice, Jetty is where I'd go.
> 
> Best,
> Erick
> 
> On Mon, Jun 30, 2014 at 4:06 AM, Otis Gospodnetic
>  wrote:
>> Hi Gurunath,
>> 
>> In 90% of our engagements with various Solr customers we see Jetty, which
>> we also recommend and use ourselves for Solr + our own services and
>> products.
>> 
>> Otis
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>> 
>> 
>> 
>> On Mon, Jun 30, 2014 at 5:07 AM, gurunath  wrote:
>> 
>>> Hi,
>>> 
>>> Confused with lot of reviews on Jetty and tomcat along with solr 4.7 ?, Is
>>> there any better option for production. want to know the complexity's with
>>> tomcat and jetty in future, as i want to cluster with huge data on solr.
>>> 
>>> Thanks
>>> 
>>> 
>>> 
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/Tomcat-or-Jetty-to-use-with-solr-in-production-tp4144712.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>> 



Re: Indexing non-stored fields

2014-06-30 Thread Steve McKay
Stored doesn't mean "stored to disk", more like "stored verbatim". When you 
index a field, Solr analyzes the field value and makes it part of the index. 
The index is persisted to disk when you commit, which is why it sticks around 
after a restart. Searching the index, mapping from search terms to doc ids, is 
very fast. However, the index is very very bad at going in reverse, from doc 
ids to terms. That's where stored fields come in. When you store a field, Solr 
takes the field value and stores the entire value separate from the index. This 
makes it trivial to get the value for a particular doc id, but it's terrible 
for searching.

So the stored attribute and the indexed attribute have different purposes. 
Indexed means you want to be able to search on the value, and stored means you 
want to be able to see the value in search results.

On Jun 30, 2014, at 8:15 PM, tomasv  wrote:

> Thanks for the quick response.
> 
> Follow-up newbie question:
> If the fields are not stored, how is the server able to search for them
> after a restart? Where does it get the data to be searched?
> 
> Example:  "bob" (firstname) is indexed but not stored. After initial
> indexing, I query for "firstname:(bob)" and I get my document back. But if
> I restart the server, where does the server go to retrieve information that
> will allow me to query for "bob" once again? It would seem that "bob" got
> stored someplace if I can query on it after a restart.
> 
> My untrained mind thinks that searching for "firstname:(bob)" (after a
> restart) will fail, but that searching for "recordid:(12345)" (in my
> original example) will succeed since it was indexed+stored.
> 
> (stored + indexed makes total sense to me; it's the indexed but NOT stored
> that I can't get my head around).
> 
> Thanks!
> 
> 
> 
> On Mon, Jun 30, 2014 at 5:05 PM, Shawn Heisey-4 [via Lucene] <
> ml-node+s472066n4144894...@n3.nabble.com> wrote:
> 
>>> Hello All, (warning: newbie question)
>>> 
>>> In our schema.xml we have defined many fields such as:
>>> 
>>> 
>>> Other fields are defined as this:
>>> 
>>> 
>>> Q: If my server is restarted/ rebooted, will I still be able to search
>> for
>>> documents using the "firstname" field? Or will my records need to be
>>> re-indexed before I can search by first name?
>>> It seems that after a re-boot, I can search for the "stored='true'"
>> fields
>>> but not the "stored='false'" fields.
>>> 
>>> Am I interpreting this correctly? or am I missing something?
>> 
>> Fields that are not stored simply mean that they will not be returned in
>> search results. If they are indexed, then you will be able to search on
>> those fields.
>> 
>> This should be the case before or after a restart.
>> 
>> Thanks,
>> Shawn
>> 
>> 
>> 
>> 
>> 
>> --
>> If you reply to this email, your message will be added to the discussion
>> below:
>> 
>> http://lucene.472066.n3.nabble.com/Indexing-non-stored-fields-tp4144893p4144894.html
>> To unsubscribe from Indexing non-stored fields, click here
>> 
>> .
>> NAML
>> 
>> 
> 
> 
> 
> -- 
> /*---
> * Tomas at Home
> * dadk...@gmail.com
> * -*/
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Indexing-non-stored-fields-tp4144893p4144895.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr irregularly having QTime > 50000ms, stracing solr cures the problem

2014-07-08 Thread Steve McKay
Sure sounds like a socket bug, doesn't it? I turn to tcpdump when Solr starts 
behaving strangely in a socket-related way. Knowing exactly what's happening at 
the transport level is worth a month of guessing and poking.

On Jul 8, 2014, at 3:53 AM, Harald Kirsch  wrote:

> Hi all,
> 
> This is what happens when I run a regular wget query to log the current 
> number of documents indexed:
> 
> 2014-07-08:07:23:28 QTime=20 numFound="5720168"
> 2014-07-08:07:24:28 QTime=12 numFound="5721126"
> 2014-07-08:07:25:28 QTime=19 numFound="5721126"
> 2014-07-08:07:27:18 QTime=50071 numFound="5721126"
> 2014-07-08:07:29:08 QTime=50058 numFound="5724494"
> 2014-07-08:07:30:58 QTime=50033 numFound="5730710"
> 2014-07-08:07:31:58 QTime=13 numFound="5730710"
> 2014-07-08:07:33:48 QTime=50065 numFound="5734069"
> 2014-07-08:07:34:48 QTime=16 numFound="5737742"
> 2014-07-08:07:36:38 QTime=50037 numFound="5737742"
> 2014-07-08:07:37:38 QTime=12 numFound="5738190"
> 2014-07-08:07:38:38 QTime=23 numFound="5741208"
> 2014-07-08:07:40:29 QTime=50034 numFound="5742067"
> 2014-07-08:07:41:29 QTime=12 numFound="5742067"
> 2014-07-08:07:42:29 QTime=17 numFound="5742067"
> 2014-07-08:07:43:29 QTime=20 numFound="5745497"
> 2014-07-08:07:44:29 QTime=13 numFound="5745981"
> 2014-07-08:07:45:29 QTime=23 numFound="5746420"
> 
> As you can see, the QTime is just over 50 seconds at irregular intervals.
> 
> This happens independent of whether I am indexing documents with around 20 
> dps or not. First I thought about a dependence on the auto-commit of 5 
> minutes, but the the 50 seconds hits are too irregular.
> 
> Furthermore, and this is *really strange*: when hooking strace on the solr 
> process, the 50 seconds QTimes disappear completely and consistently --- a 
> real Heisenbug.
> 
> Nevertheless, strace shows that there is a socket timeout of 50 seconds 
> defined in calls like this:
> 
> [pid  1253] 09:09:37.857413 poll([{fd=96, events=POLLIN|POLLERR}], 1, 5) 
> = 1 ([{fd=96, revents=POLLIN}]) <0.40>
> 
> where the fd=96 is the result of
> 
> [pid 25446] 09:09:37.855235 accept(122, {sa_family=AF_INET, 
> sin_port=htons(57236), sin_addr=inet_addr("ip address of local host")}, [16]) 
> = 96 <0.54>
> 
> where again fd=122 is the TCP port on which solr was started.
> 
> My hunch is that this is communication between the cores of solr.
> 
> I tried to search the internet for such a strange connection between socket 
> timeouts and strace, but could not find anything (the stackoverflow entry 
> from yesterday is my own :-(
> 
> 
> This smells a bit like a race condition/deadlock kind of thing which is 
> broken up by timing differences introduced by stracing the process.
> 
> Any hints appreciated.
> 
> For completeness, here is my setup:
> - solr-4.8.1,
> - cloud version running
> - 10 shards on 10 cores in one instance
> - hosted on SUSE Linux Enterprise Server 11 (x86_64), VERSION 11, PATCHLEVEL 2
> - hosted on a vmware, 4 CPU cores, 16 GB RAM
> - single digit million docs indexed, exact number does not matter
> - zero query load
> 
> 
> Harald.



Re: Solr atomic updates question

2014-07-08 Thread Steve McKay
Atomic updates fetch the doc with RealTimeGet, apply the updates to the fetched 
doc, then reindex. Whether you use atomic updates or send the entire doc to 
Solr, it has to deleteById then add. The perf difference between the atomic 
updates and "normal" updates is likely minimal.

Atomic updates are for when you have changes and want to apply them to a 
document without affecting the other fields. A regular add will replace an 
existing document completely. AFAIK Solr will let you mix atomic updates with 
regular field values, but I don't think it's a good idea.

Steve

On Jul 8, 2014, at 5:30 PM, Bill Au  wrote:

> Solr atomic update allows for changing only one or more fields of a
> document without having to re-index the entire document.  But what about
> the case where I am sending in the entire document?  In that case the whole
> document will be re-indexed anyway, right?  So I assume that there will be
> no saving.  I am actually thinking that there will be a performance penalty
> since atomic update requires Solr to first retrieve all the fields first
> before updating.
> 
> Bill



Re: Solr atomic updates question

2014-07-08 Thread Steve McKay
Take a look at this update XML:


  
05991
Steve McKay
Walla Walla
Python
  


Let's say employeeId is the key. If there's a fourth field, salary, on the 
existing doc, should it be deleted or retained? With this update it will 
obviously be deleted:


  
05991
Steve McKay
  


With this XML it will be retained:


  
05991
Walla Walla
Python
  


I'm not willing to guess what will happen in the case where non-atomic and 
atomic updates are present on the same add because I haven't looked at that 
code since 4.0, but I think I could make a case for retaining salary or for 
discarding it. That by itself reeks--and it's also not well documented. Relying 
on iffy, poorly-documented behavior is asking for pain at upgrade time.

Steve

On Jul 8, 2014, at 7:02 PM, Bill Au  wrote:

> Thanks for that under-the-cover explanation.
> 
> I am not sure what you mean by "mix atomic updates with regular field
> values".  Can you give an example?
> 
> Thanks.
> 
> Bill
> 
> 
> On Tue, Jul 8, 2014 at 6:56 PM, Steve McKay  wrote:
> 
>> Atomic updates fetch the doc with RealTimeGet, apply the updates to the
>> fetched doc, then reindex. Whether you use atomic updates or send the
>> entire doc to Solr, it has to deleteById then add. The perf difference
>> between the atomic updates and "normal" updates is likely minimal.
>> 
>> Atomic updates are for when you have changes and want to apply them to a
>> document without affecting the other fields. A regular add will replace an
>> existing document completely. AFAIK Solr will let you mix atomic updates
>> with regular field values, but I don't think it's a good idea.
>> 
>> Steve
>> 
>> On Jul 8, 2014, at 5:30 PM, Bill Au  wrote:
>> 
>>> Solr atomic update allows for changing only one or more fields of a
>>> document without having to re-index the entire document.  But what about
>>> the case where I am sending in the entire document?  In that case the
>> whole
>>> document will be re-indexed anyway, right?  So I assume that there will
>> be
>>> no saving.  I am actually thinking that there will be a performance
>> penalty
>>> since atomic update requires Solr to first retrieve all the fields first
>>> before updating.
>>> 
>>> Bill
>> 
>> 



Re: Solr atomic updates question

2014-07-09 Thread Steve McKay
Right. Without atomic updates, the client needs to fetch the document (or 
rebuild it from the system of record), apply changes, and send the entire 
document to Solr, including fields that haven't changed. With atomic updates, 
the client sends a list of changes to Solr and the server handles the 
read/modify/write steps internally. That's the closest Solr can get to updating 
a doc in place.

Steve

On Jul 8, 2014, at 10:42 PM, Bill Au  wrote:

> I see what you mean now.  Thanks for the example.  It makes things very
> clear.
> 
> I have been thinking about the explanation in the original response more.
> According to that, both regular update with entire doc and atomic update
> involves a delete by id followed by a add.  But both the Solr reference doc
> (
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents)
> says that:
> 
> "The first is *atomic updates*. This approach allows changing only one or
> more fields of a document without having to re-index the entire document."
> 
> But since Solr is doing a delete by id followed by a add, so "without
> having to re-index the entire document" apply to the client side only?  On
> the server side the add means that the entire document is re-indexed, right?
> 
> Bill
> 
> 
> On Tue, Jul 8, 2014 at 7:32 PM, Steve McKay  wrote:
> 
>> Take a look at this update XML:
>> 
>> 
>>  
>>05991
>>Steve McKay
>>Walla Walla
>>Python
>>  
>> 
>> 
>> Let's say employeeId is the key. If there's a fourth field, salary, on the
>> existing doc, should it be deleted or retained? With this update it will
>> obviously be deleted:
>> 
>> 
>>  
>>05991
>>Steve McKay
>>  
>> 
>> 
>> With this XML it will be retained:
>> 
>> 
>>  
>>05991
>>Walla Walla
>>Python
>>  
>> 
>> 
>> I'm not willing to guess what will happen in the case where non-atomic and
>> atomic updates are present on the same add because I haven't looked at that
>> code since 4.0, but I think I could make a case for retaining salary or for
>> discarding it. That by itself reeks--and it's also not well documented.
>> Relying on iffy, poorly-documented behavior is asking for pain at upgrade
>> time.
>> 
>> Steve
>> 
>> On Jul 8, 2014, at 7:02 PM, Bill Au  wrote:
>> 
>>> Thanks for that under-the-cover explanation.
>>> 
>>> I am not sure what you mean by "mix atomic updates with regular field
>>> values".  Can you give an example?
>>> 
>>> Thanks.
>>> 
>>> Bill
>>> 
>>> 
>>> On Tue, Jul 8, 2014 at 6:56 PM, Steve McKay  wrote:
>>> 
>>>> Atomic updates fetch the doc with RealTimeGet, apply the updates to the
>>>> fetched doc, then reindex. Whether you use atomic updates or send the
>>>> entire doc to Solr, it has to deleteById then add. The perf difference
>>>> between the atomic updates and "normal" updates is likely minimal.
>>>> 
>>>> Atomic updates are for when you have changes and want to apply them to a
>>>> document without affecting the other fields. A regular add will replace
>> an
>>>> existing document completely. AFAIK Solr will let you mix atomic updates
>>>> with regular field values, but I don't think it's a good idea.
>>>> 
>>>> Steve
>>>> 
>>>> On Jul 8, 2014, at 5:30 PM, Bill Au  wrote:
>>>> 
>>>>> Solr atomic update allows for changing only one or more fields of a
>>>>> document without having to re-index the entire document.  But what
>> about
>>>>> the case where I am sending in the entire document?  In that case the
>>>> whole
>>>>> document will be re-indexed anyway, right?  So I assume that there will
>>>> be
>>>>> no saving.  I am actually thinking that there will be a performance
>>>> penalty
>>>>> since atomic update requires Solr to first retrieve all the fields
>> first
>>>>> before updating.
>>>>> 
>>>>> Bill
>>>> 
>>>> 
>> 
>> 



Re: NoClassDefFoundError while indexing in Solr

2014-07-23 Thread Steve McKay
BTW, Ameya, jhighlight-1.0.jar is in the Solr binary distribution, in
contrib/extraction/lib. There are a bunch of different libraries that
Tika uses for content extraction, so this seems like a good time to make
sure that Tika has all the jars available that it might need to process
the files you're indexing. Everything relevant should be included in
contrib/extraction/lib.

Steve

On Wed, Jul 23, 2014 at 01:53:45PM +, Pablo Queixalos wrote:
> There is a source code "parser" in tika that in fact just renders the source 
> using an external source higlighter.
> 
> Seen in you stack trace : 
> com.apache.tika.parser.code.SourceCodeParser.getRenderer(SourceCodeParser.java:121)
> 
> You are indexing code (java, c or groovy). Solr seems to be missing a 
> transitive tika dependency (http://freecode.com/projects/jhighlight).
> 
> Copying the lib in solr runtime lib directory should solve your issue.
> 
> 
> Pablo.
> 
> From: Shalin Shekhar Mangar 
> Sent: Wednesday, July 23, 2014 7:43 AM
> To: solr-user@lucene.apache.org
> Subject: Re: NoClassDefFoundError while indexing in Solr
> 
> Solr is trying to load "com/uwyn/jhighlight/renderer/XhtmlRendererFactory"
> but that is not a class which is shipped or used by Solr. I think you have
> some custom plugins (a highlighter perhaps?) which uses that class and the
> classpath is not setup correctly.
> 
> 
> On Wed, Jul 23, 2014 at 2:20 AM, Ameya Aware  wrote:
> 
> > Hi
> >
> > I am running into below error while indexing a file in solr.
> >
> > Can you please help to fix this?
> >
> > ERROR - 2014-07-22 16:40:32.126; org.apache.solr.common.SolrException;
> > null:java.lang.RuntimeException: java.lang.NoClassDefFoundError:
> > com/uwyn/jhighlight/renderer/XhtmlRendererFactory
> > at
> >
> > org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:790)
> > at
> >
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:439)
> > at
> >
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
> > at
> >
> > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> > at
> > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> > at
> >
> > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> > at
> > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> > at
> >
> > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> > at
> >
> > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> > at
> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> > at
> >
> > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> > at
> >
> > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> > at
> >
> > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> > at
> >
> > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> > at
> >
> > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> > at
> >
> > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> > at org.eclipse.jetty.server.Server.handle(Server.java:368)
> > at
> >
> > org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> > at
> >
> > org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> > at
> >
> > org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
> > at
> >
> > org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
> > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:636)
> > at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> > at
> >
> > org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> > at
> >
> > org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> > at
> >
> > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> > at
> >
> > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> > at java.lang.Thread.run(Unknown Source)
> > Caused by: java.lang.NoClassDefFoundError:
> > com/uwyn/jhighlight/renderer/XhtmlRendererFactory
> > at
> >
> > org.apache.tika.parser.code.SourceCodeParser.getRenderer(SourceCodeParser.java:121)
> > at
> >
> > org.apache.tika.parser.code.SourceCodeParser.parse(SourceCodeParser.java:102)
> > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> > at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> > at
> >
> > org.apache.solr.ha

Re: Any Solr consultants available??

2014-07-23 Thread Steve McKay
Perhaps the requirement means a total of 10 years of experience spread across 
Solr, HTML, XML, Java, Tomcat, JBoss, and MySQL. This doesn't seem likely, but 
it is satisfiable, so if we proceed on the assumption that a job posting 
doesn't contain unsatisfiable requirements then it's more reasonable than a 
naive interpretation. 

There exists the possibility of a satisfiable interpretation which is more 
intuitively appealing, and IMO this warrants further investigation. 

> On Jul 23, 2014, at 3:57 PM, Tri Cao  wrote:
> 
> Well, it's kind of hard to find a person if the requirement is "10 years' 
> experience with Solr" given that Solr was created in 2004.
> 
>> On Jul 23, 2014, at 12:45 PM, Jack Krupansky  wrote:
>> 
> 
>> I occasionally get pinged by recruiters looking for Solr application 
>> developers... here’s the latest. If you are interested, either contact 
>> Jessica directly or reply to me and I’ll forward your reply.
>> 
>> Even if you don’t strictly meet all the requirements... they are having 
>> trouble finding... anyone. All the great Solr guys I know are quite busy.
>> 
>> Thanks.
>> 
>> -- Jack Krupansky
>> 
>> From: Jessica Feigin 
>> Sent: Wednesday, July 23, 2014 3:36 PM
>> To: 'Jack Krupansky' 
>> Subject: Thank you!
>> 
>> Hi Jack,
>> 
>>  
>> 
>> Thanks for your assistance, below is the Solr Consultant job description:
>> 
>>  
>> 
>> Our client, a hospitality Fortune 500 company are looking to update their 
>> platform to make accessing information easier for the franchisees. This is 
>> the first phase of the project which will take a few years. They want a 
>> hands on Solr consultant who has ideally worked in the search space. As you 
>> can imagine the company culture is great, everyone is really friendly and 
>> there is also an option to become permanent. They are looking for:
>> 
>>  
>> 
>> - 10+ years’ experience with Solr (Apache Lucene), HTML, XML, Java, Tomcat, 
>> JBoss, MySQL
>> 
>> - 5+ years’ experience implementing Solr builds of indexes, shards, and 
>> refined searches across semi-structured data sets to include architectural 
>> scaling
>> 
>> - Experience in developing a re-usable framework to support web site search; 
>> implement rich web site search, including the incorporation of metadata.
>> 
>> - Experienced in development using Java, Oracle, RedHat, Perl, shell, and 
>> clustering
>> 
>> - A strong understanding of Data analytics, algorithms, and large data 
>> structures
>> 
>> - Experienced in architectural design and resource planning for scaling 
>> Solr/Lucene capabilities.
>> 
>> - Bachelor's degree in Computer Science or related discipline.
>> 
>> 
>> 
>> 
>> 
>>  
>> 
>>  
>> 
>> Jessica Feigin 
>> Technical Recruiter
>> 
>> Technology Resource Management 
>> 30 Vreeland Rd., Florham Park, NJ 07932 
>> Phone 973-377-0040 x 415, Fax 973-377-7064 
>> Email: jess...@trmconsulting.com
>> 
>> Web site: www.trmconsulting.com
>> 
>> LinkedIn Profile: www.linkedin.com/in/jessicafeigin
>> 
>>  


Re: Invalid chunk header Error in solr

2014-08-23 Thread Steve McKay
Solr is complaining about receiving a malformed HTTP request. What 
happens when you send a correctly-formed multipart/form-data request? 
Also, is there anything you can add about the circumstances? Who's 
sending the requests that fail, is there any correlation between 
requests that fail, how often this is happening, whether other errors 
occur around the same time, etc.


lalitjangra wrote:


Hi,

I am using solr 4.6 with tomcat 7& getting Invalid chunk header error
frequently. I have updated multipartUplaodLimitinKB to 20480 KB but
still this issue occurs.

Can anyone help here?

Regards.

974159 [http-bio-8080-exec-82] ERROR
org.apache.solr.servlet.SolrDispatchFilter –
null:org.apache.commons.fileupload.FileUploadBase$IOFileUploadException:
Processing of multipart/form-data request failed. Invalid chunk header
at
org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:367)
at
org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126)
at
org.apache.solr.servlet.SolrRequestParsers$MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:547)
at
org.apache.solr.servlet.SolrRequestParsers$StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:681)
at
org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:150)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:393)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:315)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.io.IOException: Invalid chunk header
at
org.apache.coyote.http11.filters.ChunkedInputFilter.doRead(ChunkedInputFilter.java:172)
at
org.apache.coyote.http11.AbstractInputBuffer.doRead(AbstractInputBuffer.java:346)
at org.apache.coyote.Request.doRead(Request.java:422)
at
org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:290)
at
org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:449)
at
org.apache.catalina.connector.InputBuffer.read(InputBuffer.java:315)
at
org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.java:200)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at
org.apache.commons.fileupload.util.LimitedInputStream.read(LimitedInputStream.java:125)
at
org.apache.commons.fileupload.MultipartStream$ItemInputStream.makeAvailable(MultipartStream.java:977)
at
org.apache.commons.fileupload.MultipartStream$ItemInputStream.read(MultipartStream.java:887)
at java.io.InputStream.read(InputStream.java:101)




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Invalid-chunk-header-Error-in-solr-tp4154707.html

Sent from the Solr - User mailing list archive at Nabble.com.


RE: aggregate functions in Solr?

2011-09-27 Thread Steve McKay
> -Original Message-
> From: Esteban Donato [mailto:esteban.don...@gmail.com]
> Sent: Monday, September 26, 2011 2:08 PM
> To: solr-user@lucene.apache.org
> Subject: aggregate functions in Solr?
> 
> Hello guys,
> 
>   I need to implement a functionality which requires something similar
> to aggregate functions in SQL.  My Solr schema looks like this:
> 
> -doc_id: integer
> -date: date
> -value1: integer
> -value2: integer
> 
>   Basically the index contains some numerical values (value1, value2,
> etc) per doc and date.  Given a date range query, I need to return
> some stats consolidated by docs for that given date range.  I typical
> response could be something like this:
> 
> doc_id, sum(value1),  avg(value2),  sum(value1)/sum(value2).
> 
>   I checked StatsComponent using stats.facet=doc_id but it seems it
> doesn't cover my needs (especially for complex stats like
> sum(value1)/sum(value2)).  Also checked FieldCollapsing but I couldn't
> find a way to configure an aggregate function there.
> 
>   Is there any way to implement this, or I will have to resolve it out of
> Solr?
> 
> Regards,
> Esteban

To use your example, you could query 
stats.field=value1&stats.field=value2&stats.facet=doc_id and calculate 
sum(value1)/sum(value2) in a Velocity template. I'm not sure it's a *good* 
solution, but that's a way you could get the results you want in the response 
from Solr.

Steve