Re: [MASSMAIL]Weighting of prominent text in HTML

2015-01-26 Thread Dan Davis
Helps lots.   Thanks, Jorge Luis.   Good point about different fields -
I'll just put the h1 and h2 (however deep I want to go) into fields, and we
can sort out weighting and whether we want it later with edismax.   The
blogs on adding plugins for that sort of thing look straightforward.

On Mon, Jan 26, 2015 at 12:47 AM, Jorge Luis Betancourt González <
jlbetanco...@uci.cu> wrote:

> Hi Dan:
>
> Agreed, this question is more Nutch related than Solr ;)
>
> Nutch doesn't send any data into /update/extract request handler, all the
> text and metadata extraction happens in Nutch side rather than relying in
> the ExtractRequestHandler provided by Solr. Underneath Nutch use Tika the
> same technology as the ExtractRequestHandler provided by Solr so shouldn't
> be any greater difference.
>
> By default Nutch doesn't boost anything as is Solr job to boost the
> different content in the different fields, which is what happens when you
> do a query against Solr. Nutch calculates the LinkRank which is a variation
> of the famous PageRank (or the OPIC score, which is another scoring
> algorithm implemented in Nutch, which I believe is the default in Nutch
> 2.x). What you can do is use the headings and map the heading tags into
> different fields and then apply different boosts to each field.
>
> The general idea with Nutch is to "make pieces of the web page" and store
> each piece in a different field in Solr, then you can tweak your relevance
> function using the values yo see fit, so you don't need to write any plugin
> to accomplish this (at least for the h1, h2, etc. example you provided, if
> you want to extract other parts of the webpage you'll need to write your
> own plugin to do so).
>
> Nutch is highly customizable, you can write a plugin for almost any piece
> of logic, from parsers to indexers, passing from URL filters, scoring
> algorithms, protocols and a long long list, usually the plugins are not so
> difficult to write, but the problem comes to know which extension point you
> need to use, this comes with experience and taking a good dive in the
> source code.
>
> Hope this helps,
>
> - Original Message -
> From: "Dan Davis" 
> To: "solr-user" 
> Sent: Monday, January 26, 2015 12:08:13 AM
> Subject: [MASSMAIL]Weighting of prominent text in HTML
>
> By examining solr.log, I can see that Nutch is using the /update request
> handler rather than /update/extract.   So, this may be a more appropriate
> question for the nutch mailing list.   OTOH, y'all know the anwser off the
> top of your head.
>
> Will Nutch boost text occurring in h1, h2, etc. more heavily than text in a
> normal paragraph?Can this weighting be tuned without writing a plugin?
>Is writing a plugin often needed because of the flexibility that is
> needed in practice?
>
> I wanted to call this post *Anatomy of a small scale search engine*, but
> lacked the nerve ;)
>
> Thanks, all and many,
>
> Dan Davis, Systems/Applications Architect
> National Library of Medicine
>
>
> ---
> XII Aniversario de la creación de la Universidad de las Ciencias
> Informáticas. 12 años de historia junto a Fidel. 12 de diciembre de 2014.
>
>


Re: Solr Recovery process

2015-01-26 Thread Ramkumar R. Aiyengar
https://issues.apache.org/jira/browse/SOLR-6359 has a patch which allows
this to be configured, it has not gone in as yet.

Note that the current design of the UpdateLog causes it to be less
efficient if the number is bumped up too much, but certainly worth
experimenting with.
On 22 Jan 2015 02:47, "Nishanth S"  wrote:

> Thank you Shalin.So in a system where the indexing rate is more than 5K TPS
> or so the replica  will never be able to recover   through peer sync
> process.In  my case I have mostly seen  step 3 where a full copy happens
> and  if the index size is huge it takes a very long time for replicas to
> recover.Is there a way we can  configure the  number of missed updates for
> peer sync.
>
> Thanks,
> Nishanth
>
> On Wed, Jan 21, 2015 at 4:47 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
> > Hi Nishanth,
> >
> > The recovery happens as follows:
> >
> > 1. PeerSync is attempted first. If the number of new updates on leader is
> > less than 100 then the missing documents are fetched directly and indexed
> > locally. The tlog tells us the last 100 updates very quickly. Other uses
> of
> > the tlog are for durability of updates and of course, startup recovery.
> > 2. If the above step fails then replication recovery is attempted. A hard
> > commit is called on the leader and then the leader is polled for the
> latest
> > index version and generation. If the leader's version and generation are
> > greater than local index's version/generation then the difference of the
> > index files between leader and replica are fetched and installed.
> > 3. If the above fails (because leader's version/generation is somehow
> equal
> > or more than local) then a full index recovery happens and the entire
> index
> > from the leader is fetched and installed locally.
> >
> > There are some other details involved in this process too but probably
> not
> > worth going into here.
> >
> > On Wed, Jan 21, 2015 at 5:13 PM, Nishanth S 
> > wrote:
> >
> > > Hello Everyone,
> > >
> > > I am hitting a few issues with solr replicas going into recovery and
> then
> > > doing a full index copy.I am trying to understand the solr recovery
> > > process.I have read a few blogs  on this and saw  that when leader
> > notifies
> > > a replica to  recover(in my case it is due to connection resets) it
> will
> > > try to do a peer sync first and  if the missed updates are more than
> 100
> > it
> > > will do a full index copy from the leader.I am trying to understand
> what
> > > peer sync is and where does tlog come into picture.Are tlogs replayed
> > only
> > > during server restart?.Can some one  help me with this?
> > >
> > > Thanks,
> > > Nishanth
> > >
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>


SOS-help: How to store solr index data in hbase table???

2015-01-26 Thread zhangjianad

hi all,
Now I store solr index data on local disk. I want store solr index
data in hbase table, how to configure ?  tips , any guys known about
this???


thanks!
Jan


---
免责声明(Disclaimer)
1.此电子邮件包含来自神州数码的信息,而且是机密的或者专用的信息。这些信息是供所有以上列出的个人或者团体使用的。如果您不是此邮件的预期收件人,请勿阅读、复制、转发或存储此邮件。如果已误收此邮件,请通知发件人。
This e-mail may contain confidential and/or privileged information from Digital 
China and is intended solely for the attention and use of the person(s) named 
above. If you are not the intended recipient (or have received this e-mail in 
error), please notify the sender immediately and destroy this e-mail. Any 
unauthorized copying, disclosure or distribution of the material in this email 
is strictly forbidden.
2.本公司不担保本电子邮件中信息的准确性、适当性或完整性,并且对此产生的任何错误或疏忽不承担任何责任。
The content provided in this e-mail can not be guaranteed and assured to be 
accurate, appropriate for all, and complete by Digital China, and Digital China 
can not be held responsible for any error or negligence derived therefrom.
3.接收方应在接收电子邮件或任何附件时检查有无病毒。本公司对由于转载本电子邮件而引发病毒产生的任何损坏不承担任何责任。
The internet communications through this e-mail can not be guaranteed or 
assured to be error or virus-free, and the sender do not accept liability for 
any errors, omissions or damages arising therefrom.



SuggestStopFilter not usable in Solr 4.10.x?

2015-01-26 Thread Clemens Wyss DEV
https://issues.apache.org/jira/browse/LUCENE-5820 

Due to the missing factory the SuggestStopFilter is not "usable" before 
Solr/Lucene 5, right?

Any plan on when Solr 5 will appear?

How can I get hold of Solr/Lucene 5?


Re: SOS-help: How to store solr index data in hbase table???

2015-01-26 Thread Shawn Heisey
On 1/26/2015 2:56 AM, zhangjia...@dcits.com wrote:
>   Now I store solr index data on local disk. I want store solr index
> data in hbase table, how to configure ?  tips , any guys known about
> this???

I have no idea how you would do that. You *can* store your indexes in
HDFS storage, but that's not the same thing.

https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS

I have never done this, so I have no idea whether this documentation is
complete.

Thanks,
Shawn



query time join (stored or indexed value field?)

2015-01-26 Thread Alvaro Cabrerizo
Hi,

Is the time join query  using stored data
or indexed data from the fields set in "from" and "to"? (For example, the
facet feature makes the count based on the indexed data)

I've made an small example (using tokenizers, stopwords...) and it seems
that the join uses the stored one, but I would be nice to confirm it.

Regards.


Re: query time join (stored or indexed value field?)

2015-01-26 Thread Mikhail Khludnev
indexed for sure, and/or docValues. not stored for sure.

On Mon, Jan 26, 2015 at 3:44 PM, Alvaro Cabrerizo 
wrote:

> Hi,
>
> Is the time join query  using stored
> data
> or indexed data from the fields set in "from" and "to"? (For example, the
> facet feature makes the count based on the indexed data)
>
> I've made an small example (using tokenizers, stopwords...) and it seems
> that the join uses the stored one, but I would be nice to confirm it.
>
> Regards.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Sorting on a computed value

2015-01-26 Thread Shawn Heisey
On 1/25/2015 4:13 PM, tedsolr wrote:
> I'll bet some super user has figured this out. How can I perform a sort on a
> single computed field? I have a QParserPlugin that is collapsing docs based
> on data from multiple fields. I am summing the values from one numerical
> field 'X'. I was going to use a DocTransformer to inject that summed value
> into the search results as a new field. But I have now realized that I have
> to be able to sort on this summed field.
> 
> Without retrieving all results (which could be 1M+) in my app and sorting
> manually, is there any way to sort on my computed field within Solr?
> (using Solr 4.9)

Sorting by a function query:

http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function
https://cwiki.apache.org/confluence/display/solr/Function+Queries#FunctionQueries-SortByFunction

The second URL also shows how to put the results of a function into the
search results as a pseudo-field.

You do mention that you are talking about summing the values from one
field ... if that's not being done with a function query, then this
probably does not apply, and I don't know what you'll need.

Thanks,
Shawn



Re: SOS-help: How to store solr index data in hbase table???

2015-01-26 Thread Dmitry Kan
A bit of googling reveals this article of integrating HBase and Lucene, for
example:
http://www.infoq.com/articles/LuceneHbase

The article references this code: https://github.com/akkumar/hbasene
Does not look like it is under an active development, but might be worth
exploring.

Dmitry

On Mon, Jan 26, 2015 at 2:42 PM, Shawn Heisey  wrote:

> On 1/26/2015 2:56 AM, zhangjia...@dcits.com wrote:
> >   Now I store solr index data on local disk. I want store solr index
> > data in hbase table, how to configure ?  tips , any guys known about
> > this???
>
> I have no idea how you would do that. You *can* store your indexes in
> HDFS storage, but that's not the same thing.
>
> https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS
>
> I have never done this, so I have no idea whether this documentation is
> complete.
>
> Thanks,
> Shawn
>
>


-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Re: Indexed epoch time in Solr

2015-01-26 Thread Jim . Musil
If you are using the DataImportHandler, you can leverage on of the
transformers, such as the DateFormatTransformer:

http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer


If you are updating documents directly you can define a regex
transformation in your schema.xml:

https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternRe
placeCharFilterFactory


If you have control over the input, then I always find it better to just
transform it prior to sending it into solr.

Jim

On 1/25/15, 11:35 PM, "Ahmed Adel"  wrote:

>Hi All,
>
>Is there a way to convert unix time field that is already indexed to
>ISO-8601 format in query response? If this is not possible on the query
>level, what is the best way to copy this field to a new Solr standard date
>field.
>
>Thanks,
>
>-- 
>*Ahmed Adel*
>



Re: Need Help with custom ZIPURLDataSource class

2015-01-26 Thread Dan Davis
I have seen such errors by looking under Logging in the Solr Admin UI.
There is also the LogTransformer for Data Import Handler.

However, it is a design choice in Data Import Handler to skip fields not in
the schema.   I would suggest you always use Debug and Verbose to do the
first couple of documents through the GUI, and then look at the debugging
output with a fine toothed comb.

I'm not sure whether there's an option for it, but it would be nice if the
Data Import Handler could collect skipped fields into the status response.
  That would highlight your problem without forcing you to look in other
areas.


On Fri, Jan 23, 2015 at 9:51 PM, Carl Roberts  wrote:

> NVM - I have this working.
>
> The problem was this:  pk="link" in rss-dat.config.xml but unique id not
> link in schema.xml - it is id.
>
> From rss-data-config.xml:
>
>  *pk="link"*
> url="https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.
> xml.zip"
> processor="XPathEntityProcessor"
> forEach="/nvd/entry">
> 
>  commonField="true" />
>  commonField="true" />
> 
> 
>
> From schema.xml:
>
> * id
>
> *What really bothers me is that there were no errors output by Solr to
> indicate this type of misconfiguration error and all the messages that Solr
> gave indicated the import was successful.  This lack of appropriate error
> reporting is a pain, especially for someone learning Solr.
>
> Switching pk="link" to pk="id" solved the problem and I was then able to
> import the data.
>
> On 1/23/15, 6:34 PM, Carl Roberts wrote:
>
>>
>> Hi,
>>
>> I created a custom ZIPURLDataSource class to unzip the content from an
>> http URL for an XML ZIP file and it seems to be working (at least I have
>> no errors), but no data is imported.
>>
>> Here is my configuration in rss-data-config.xml:
>>
>> 
>> > readTimeout="3"/>
>> 
>> > pk="link"
>> url="https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip";
>> processor="XPathEntityProcessor"
>> forEach="/nvd/entry"
>> transformer="DateFormatTransformer">
>> 
>> 
>> 
>> > xpath="/nvd/entry/vulnerable-configuration/logical-test/fact-ref/@name"
>> commonField="false" />
>> > xpath="/nvd/entry/vulnerable-software-list/product" commonField="false"
>> />
>> > commonField="false" />
>> > commonField="false" />
>> 
>> 
>> 
>> 
>>
>>
>> Attached is the ZIPURLDataSource.java file.
>>
>> It actually unzips and saves the raw XML to disk, which I have verified
>> to be a valid XML file.  The file has one or more entries (here is an
>> example):
>>
>> http://scap.nist.gov/schema/scap-core/0.1";
>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
>> xmlns:patch="http://scap.nist.gov/schema/patch/0.1";
>> xmlns:vuln="http://scap.nist.gov/schema/vulnerability/0.4";
>> xmlns:cvss="http://scap.nist.gov/schema/cvss-v2/0.2";
>> xmlns:cpe-lang="http://cpe.mitre.org/language/2.0";
>> xmlns="http://scap.nist.gov/schema/feed/vulnerability/2.0";
>> pub_date="2015-01-10T05:37:05"
>> xsi:schemaLocation="http://scap.nist.gov/schema/patch/0.1
>> http://nvd.nist.gov/schema/patch_0.1.xsd
>> http://scap.nist.gov/schema/scap-core/0.1
>> http://nvd.nist.gov/schema/scap-core_0.1.xsd
>> http://scap.nist.gov/schema/feed/vulnerability/2.0
>> http://nvd.nist.gov/schema/nvd-cve-feed_2.0.xsd"; nvd_xml_version="2.0">
>> 
>> http://nvd.nist.gov/";>
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> cpe:/o:freebsd:freebsd:2.2.8
>> cpe:/o:freebsd:freebsd:1.1.5.1
>> cpe:/o:freebsd:freebsd:2.2.3
>> cpe:/o:freebsd:freebsd:2.2.2
>> cpe:/o:freebsd:freebsd:2.2.5
>> cpe:/o:freebsd:freebsd:2.2.4
>> cpe:/o:freebsd:freebsd:2.0.5
>> cpe:/o:freebsd:freebsd:2.2.6
>> cpe:/o:freebsd:freebsd:2.1.6.1
>> cpe:/o:freebsd:freebsd:2.0.1
>> cpe:/o:freebsd:freebsd:2.2
>> cpe:/o:freebsd:freebsd:2.0
>> cpe:/o:openbsd:openbsd:2.3
>> cpe:/o:freebsd:freebsd:3.0
>> cpe:/o:freebsd:freebsd:1.1
>> cpe:/o:freebsd:freebsd:2.1.6
>> cpe:/o:openbsd:openbsd:2.4
>> cpe:/o:bsdi:bsd_os:3.1
>> cpe:/o:freebsd:freebsd:1.0
>> cpe:/o:freebsd:freebsd:2.1.7
>> cpe:/o:freebsd:freebsd:1.2
>> cpe:/o:freebsd:freebsd:2.1.5
>> cpe:/o:freebsd:freebsd:2.1.7.1
>> 
>> CVE-1999-0001
>> 1999-12-30T00:00:00.000-05:00
>>
>> 2010-12-16T00:00:00.000-05:00
>>
>> 
>> 
>> 5.0
>> NETWORK
>> LOW
>> NONE
>> NONE
>> NONE
>> PARTIAL
>> http://nvd.nist.gov
>> 2004-01-01T00:00:00.000-05:00
>>
>> 
>> 
>> 
>> 
>> OSVDB
>> http://www.osvdb.org/5707";
>> xml:lang="en">5707
>> 
>> 
>> CONFIRM
>> http://www.openbsd.org/errata23.html#tcpfix";
>> xml:lang="en">http://www.openbsd.org/errata23.html#tcpfix
>>
>> 
>> ip_input.c in BSD-derived TCP/IP implementations allows
>> remote attackers to cause a denial of service (crash or hang) via
>> crafted packets.
>> 
>>
>>
>> Here is the curl command:
>>
>> curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import
>>
>> And here is the output from the console for Jetty:
>>
>> main{StandardDirectoryReader(segm

Re: Need help importing data

2015-01-26 Thread Dan Davis
Glad it worked out.

On Fri, Jan 23, 2015 at 9:50 PM, Carl Roberts  wrote:

> NVM
>
> I figured this out.  The problem was this:  pk="link" in
> rss-dat.config.xml but unique id not link in schema.xml - it is id.
>
> From rss-data-config.xml:
>
>  *pk="link"*
> url="https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip";
> processor="XPathEntityProcessor"
> forEach="/nvd/entry">
> 
>  commonField="true" />
>  commonField="true" />
> 
> 
>
> From schema.xml:
>
> * id
>
> *What really bothers me is that there were no errors output by Solr to
> indicate this type of misconfiguration error and all the messages that Solr
> gave indicated the import was successful.  This lack of appropriate error
> reporting is a pain, especially for someone learning Solr.
>
> Switching pk="link" to pk="id" solved the problem and I was then able to
> import the data.
>
>
>
> On 1/23/15, 9:39 PM, Carl Roberts wrote:
>
>> Hi,
>>
>> I have set log4j logging to level DEBUG and I have also modified the code
>> to see what is being imported and I can see the nextRow() records, and the
>> import is successful, however I have no data. Can someone please help me
>> figure this out?
>>
>> Here is the logging output:
>>
>> ow:  r1={{id=CVE-2002-2353, cve=CVE-2002-2353, cwe=CWE-264,
>> $forEach=/nvd/entry}}
>> 2015-01-23 21:28:04,606- INFO-[Thread-15]-[XPathEntityProcessor.java:251]
>> -org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow:
>> r3={{id=CVE-2002-2353, cve=CVE-2002-2353, cwe=CWE-264, $forEach=/nvd/entry}}
>> 2015-01-23 21:28:04,606- INFO-[Thread-15]-[XPathEntityProcessor.java:221]
>> -org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow:
>> URL={url}
>> 2015-01-23 21:28:04,606- INFO-[Thread-15]-[XPathEntityProcessor.java:227]
>> -org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow:
>> r1={{id=CVE-2002-2354, cve=CVE-2002-2354, cwe=CWE-20, $forEach=/nvd/entry}}
>> 2015-01-23 21:28:04,606- INFO-[Thread-15]-[XPathEntityProcessor.java:251]
>> -org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow:
>> r3={{id=CVE-2002-2354, cve=CVE-2002-2354, cwe=CWE-20, $forEach=/nvd/entry}}
>> 2015-01-23 21:28:04,606- INFO-[Thread-15]-[XPathEntityProcessor.java:221]
>> -org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow:
>> URL={url}
>> 2015-01-23 21:28:04,606- INFO-[Thread-15]-[XPathEntityProcessor.java:227]
>> -org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow:
>> r1={{id=CVE-2002-2355, cve=CVE-2002-2355, cwe=CWE-255, $forEach=/nvd/entry}}
>> 2015-01-23 21:28:04,606- INFO-[Thread-15]-[XPathEntityProcessor.java:251]
>> -org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow:
>> r3={{id=CVE-2002-2355, cve=CVE-2002-2355, cwe=CWE-255, $forEach=/nvd/entry}}
>> 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:221]
>> -org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow:
>> URL={url}
>> 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:227]
>> -org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow:
>> r1={{id=CVE-2002-2356, cve=CVE-2002-2356, cwe=CWE-264, $forEach=/nvd/entry}}
>> 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:251]
>> -org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow:
>> r3={{id=CVE-2002-2356, cve=CVE-2002-2356, cwe=CWE-264, $forEach=/nvd/entry}}
>> 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:221]
>> -org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow:
>> URL={url}
>> 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:227]
>> -org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow:
>> r1={{id=CVE-2002-2357, cve=CVE-2002-2357, cwe=CWE-119, $forEach=/nvd/entry}}
>> 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:251]
>> -org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow:
>> r3={{id=CVE-2002-2357, cve=CVE-2002-2357, cwe=CWE-119, $forEach=/nvd/entry}}
>> 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:221]
>> -org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow:
>> URL={url}
>> 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:227]
>> -org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow:
>> r1={{id=CVE-2002-2358, cve=CVE-2002-2358, cwe=CWE-79, $forEach=/nvd/entry}}
>> 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:251]
>> -org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow:
>> r3={{id=CVE-2002-2358, cve=CVE-2002-2358, cwe=CWE-79, $forEach=/nvd/entry}}
>> 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:221]
>> -org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow:
>> URL={url}
>> 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:227]
>> -org.apache

Re: Indexed epoch time in Solr

2015-01-26 Thread Dan Davis
I think copying to a new Solr date field is your best bet, because then you
have the flexibility to do date range facets in the future.

If you can re-index, and are using Data Import Handler, Jim Musil's
suggestion is just right.

If you can re-index, and are not using Data Import Handler:

   - This seems a job for an UpdateRequestProcessor
   ,
   but I don't see one for this.
   - This seems to be a good candidate for a standard, core
   UpdateRequestProcessor, but I haven't checked Jira for a bug report.

If the scale is too large to re-index, then there is surely still a way,
but I'm not sure I can advise you on the best one.  I'm not an Solr expert
yet... just someone on the list with a IR background.

On Mon, Jan 26, 2015 at 12:35 AM, Ahmed Adel  wrote:

> Hi All,
>
> Is there a way to convert unix time field that is already indexed to
> ISO-8601 format in query response? If this is not possible on the query
> level, what is the best way to copy this field to a new Solr standard date
> field.
>
> Thanks,
>
> --
> *Ahmed Adel*
> 
>


Re: SuggestStopFilter not usable in Solr 4.10.x?

2015-01-26 Thread Alexandre Rafalovitch
RC1 of Solr 5 should be out very soon (days). But you can always
download the latest source from svn_5_0 branch (remember to use
shallow copy if using git) and build it yourself ('ant package' inside
'solr' directory). It's not terribly hard.

Regards,
  Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 26 January 2015 at 07:30, Clemens Wyss DEV  wrote:
> https://issues.apache.org/jira/browse/LUCENE-5820
>
> Due to the missing factory the SuggestStopFilter is not "usable" before 
> Solr/Lucene 5, right?
>
> Any plan on when Solr 5 will appear?
>
> How can I get hold of Solr/Lucene 5?


Re: Solr Recovery process

2015-01-26 Thread Nishanth S
Thank you Ram.

On Mon, Jan 26, 2015 at 1:49 AM, Ramkumar R. Aiyengar <
andyetitmo...@gmail.com> wrote:

> https://issues.apache.org/jira/browse/SOLR-6359 has a patch which allows
> this to be configured, it has not gone in as yet.
>
> Note that the current design of the UpdateLog causes it to be less
> efficient if the number is bumped up too much, but certainly worth
> experimenting with.
> On 22 Jan 2015 02:47, "Nishanth S"  wrote:
>
> > Thank you Shalin.So in a system where the indexing rate is more than 5K
> TPS
> > or so the replica  will never be able to recover   through peer sync
> > process.In  my case I have mostly seen  step 3 where a full copy happens
> > and  if the index size is huge it takes a very long time for replicas to
> > recover.Is there a way we can  configure the  number of missed updates
> for
> > peer sync.
> >
> > Thanks,
> > Nishanth
> >
> > On Wed, Jan 21, 2015 at 4:47 PM, Shalin Shekhar Mangar <
> > shalinman...@gmail.com> wrote:
> >
> > > Hi Nishanth,
> > >
> > > The recovery happens as follows:
> > >
> > > 1. PeerSync is attempted first. If the number of new updates on leader
> is
> > > less than 100 then the missing documents are fetched directly and
> indexed
> > > locally. The tlog tells us the last 100 updates very quickly. Other
> uses
> > of
> > > the tlog are for durability of updates and of course, startup recovery.
> > > 2. If the above step fails then replication recovery is attempted. A
> hard
> > > commit is called on the leader and then the leader is polled for the
> > latest
> > > index version and generation. If the leader's version and generation
> are
> > > greater than local index's version/generation then the difference of
> the
> > > index files between leader and replica are fetched and installed.
> > > 3. If the above fails (because leader's version/generation is somehow
> > equal
> > > or more than local) then a full index recovery happens and the entire
> > index
> > > from the leader is fetched and installed locally.
> > >
> > > There are some other details involved in this process too but probably
> > not
> > > worth going into here.
> > >
> > > On Wed, Jan 21, 2015 at 5:13 PM, Nishanth S 
> > > wrote:
> > >
> > > > Hello Everyone,
> > > >
> > > > I am hitting a few issues with solr replicas going into recovery and
> > then
> > > > doing a full index copy.I am trying to understand the solr recovery
> > > > process.I have read a few blogs  on this and saw  that when leader
> > > notifies
> > > > a replica to  recover(in my case it is due to connection resets) it
> > will
> > > > try to do a peer sync first and  if the missed updates are more than
> > 100
> > > it
> > > > will do a full index copy from the leader.I am trying to understand
> > what
> > > > peer sync is and where does tlog come into picture.Are tlogs replayed
> > > only
> > > > during server restart?.Can some one  help me with this?
> > > >
> > > > Thanks,
> > > > Nishanth
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Shalin Shekhar Mangar.
> > >
> >
>


REMINDER: ApacheCon 2015 Call For Papers Ends This Week (February 1st)

2015-01-26 Thread Chris Hostetter


(cross posted, please confine replies to general@lucene)


ApacheCon 2015 Will be in Austin Texas April 13-17.

http://apachecon.com/

The Call For Papers is currently open, but it ends 2015-02-01 (11:55PM GMT-0600)

https://events.linuxfoundation.org/events/apachecon-north-america/program/cfp


This is a great oportunity to showcase how you use Lucene/Solr, or help 
teach people about features of Lucene/Solr that you think folks might 
not know enough about or fully appreciate.


All levels of talks are welcome -- you don't have to be a Lucene/Solr 
expert to submit a proposal.  Talks targeted at entry level users, and 
talks by novice users about their experiences are frequently in high 
demand.


For more information, and advice on how to prepare a great talk, please 
see the CFP webpage...


https://events.linuxfoundation.org/events/apachecon-north-america/program/cfp



-Hoss
http://www.lucidworks.com/


solrcloud shard splitting with lock type native

2015-01-26 Thread calin.grecu
Hi there,

Shard splitting seems to fail if the lock type is native. Here is my config
setting:
   
native
1000
  


Shard splitting works if i set the lock type to single or none. However,
after splitting, i am not able to set the lock type back to native, which is
the default. 

Here is the log for when i try to split and using lock type native:

OverseerCollectionProcessor.processMessage : splitshard , {
  "operation":"splitshard",
  "shard":"shard1",
  "collection":"mycollection",
  "async":"myhandle11"}
  
  
  1/26/2015, 1:49:02 PM
ERROR
CoreContainer
Error creating core [mycollection_shard1_0_replica1]: Error opening new
searcher
org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.(SolrCore.java:873)
at org.apache.solr.core.SolrCore.(SolrCore.java:646)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:491)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:466)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:575)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:199)
at
org.apache.solr.handler.admin.CoreAdminHandler$ParallelCoreAdminHandlerThread.run(CoreAdminHandler.java:1234)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1565)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1677)
at org.apache.solr.core.SolrCore.(SolrCore.java:845)
... 9 more
Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain
timed out: NativeFSLock@/nfs/solr/index/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:89)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:753)
at 
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:77)
at 
org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64)
at
org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:279)
at
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:111)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1528)
... 11 more

org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.(SolrCore.java:873)
at org.apache.solr.core.SolrCore.(SolrCore.java:646)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:491)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:466)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:575)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:199)
at
org.apache.solr.handler.admin.CoreAdminHandler$ParallelCoreAdminHandlerThread.run(CoreAdminHandler.java:1234)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1565)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1677)
at org.apache.solr.core.SolrCore.(SolrCore.java:845)
... 9 more
Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain
timed out: NativeFSLock@/nfs/solr/index/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:89)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:753)
at 
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:77)
at 
org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64)
at
org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:279)
at
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:111)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1528)
... 11 more

1/26/2015, 1:49:26 PM
ERROR
SolrIndexWriter
SolrIndexWriter was not closed prior to finalize(),​ indicates a bug --
POSSIBLE RESOURCE LEAK!!!
1/26/2015, 1:49:26 PM
ERROR
SolrIndexWriter
Error closing IndexWriter
java.lang.NullPointerException
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3230)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3203)
at org.apache.lucene.index.IndexWriter.

Re: Sorting on a computed value

2015-01-26 Thread tedsolr
That's an interesting link Shawn. Especially since it mentions the
possibility of sorting on pseudo-fields.

My delegating collector computes the customs stats and stores them in the
request context. I have a doc transformer that then grabs the stats for each
doc and inserts the data in the output. Here's a sample return doc:

 {
"ITEM_DESCRIPTION": "FREIGHT PAY AMT FOR ITEM - 30934014",
"SUPPLIER_NAME": "JESUS ACOSTA MORENO",
"GL_ACCOUNT_NAME": "-",
"PART_NUMBER": "-",
"MCC_CODE": "SDBHAULER.NA",
"[AggregationStats]": {
  "count": 1,
  "spend": 8402.39
}
  },

My stats are in the [AggregationStats] "field". I don't know if this
qualifies as a pseudo-field. Sorting happens before my doc transformer is
called, so I don't think this data is available for sort. Like you said, I'm
not using a function query to create this data, so maybe this idea won't
work.

I'm going to try to use doc scoring. If I can make the score match my pseudo
fields then it might work.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-on-a-computed-value-tp4181875p4182060.html
Sent from the Solr - User mailing list archive at Nabble.com.


SimplePostTool with extracted Outlook messages

2015-01-26 Thread Mark
I'm looking to index some outlook extracted messages *.msg

I notice by default msg isn't one of the defaults so I tried the following:

java -classpath dist/solr-core-4.10.3.jar -Dtype=application/vnd.ms-outlook
org.apache.solr.util.SimplePostTool C:/temp/samplemsg/*.msg

That didn't work

However curl did:

curl "
http://localhost:8983/solr/update/extract?commit=true&overwrite=true&literal.id=6252671B765A1748992DF1A6403BDF81A4A15E00";
-F "myfile=@6252671B765A1748992DF1A6403BDF81A4A15E00.msg"

My question is why does the second work and not the first?


Re: replicas goes in recovery mode right after update

2015-01-26 Thread Vijay Sekhri
86775/11906:delGen=15
_5co(4.10.0):C871785/93841:delGen=10 _5m7(4.10.0):C122852/31675:delGen=11
_5hm(4.10.0):C457977/32535:delGen=11 _5q0(4.10.0):C13610/649:delGen=6
_5kb(4.10.0):C424868/19149:delGen=11 _5f5(4.10.0):C116528/42495:delGen=1
_5nx(4.10.0):C33236/20668:delGen=1 _5qm(4.10.0):C29770/1:delGen=1
_5o8(4.10.0):C27155/7531:delGen=1 _5of(4.10.0):C38545/5677:delGen=1
_5p7(4.10.0):C37457/648:delGen=1 _5qv(4.10.0):C3973
_5q1(4.10.0):C402/1:delGen=1 _5q2(4.10.0):C779 _5qa(4.10.0):C967
_5qc(4.10.0):C1828 _5qh(4.10.0):C1765 _5qi(4.10.0):C1241 _5qq(4.10.0):C1997
_5qr(4.10.0):C1468 _5qp(4.10.0):C1729 _5qo(4.10.0):C3456/1:delGen=1
_5qu(4.10.0):C27 _5qt(4.10.0):C30 _5qx(4.10.0):C638 _5qy(4.10.0):C1407
_5qw(4.10.0):C802 _5r2(4.10.0):C32769/1:delGen=1 _5r3(4.10.0):C26057
_5r4(4.10.0):C23934/1:delGen=1
14:16:49,283 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [DW][recoveryExecutor-7-thread-1]:
startFullFlush
14:16:49,284 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [DW][recoveryExecutor-7-thread-1]:
anyChanges? numDocsInRam=24222 deletes=true hasTickets:false
pendingChangesInFullFlush: false
14:16:49,284 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [DWFC][recoveryExecutor-7-thread-1]:
addFlushableState DocumentsWriterPerThread [pendingDeletes=gen=0 24221
deleted terms (unique count=24220) bytesUsed=4455794, segment=_5r5,
aborting=false, numDocsInRAM=24222, deleteQueue=DWDQ: [ generation: 1 ]]
14:16:49,322 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [DWPT][recoveryExecutor-7-thread-1]: flush
postings as segment _5r5 numDocs=24222

==> gc.20150126-135638.log <==
1211.362: [GC1211.362: [ParNew: 966947K->88429K(996800K), 0.0191260 secs]
1499845K->633603K(1995752K), 0.0192710 secs] [Times: user=0.20 sys=0.00,
real=0.02 secs]

==> server.log <==
14:16:50,387 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [DWPT][recoveryExecutor-7-thread-1]: new
segment has 1 deleted docs
14:16:50,387 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [DWPT][recoveryExecutor-7-thread-1]: new
segment has no vectors; no norms; no docValues; prox; freqs
14:16:50,387 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [DWPT][recoveryExecutor-7-thread-1]:
flushedFiles=[_5r5_Lucene41_0.pos, _5r5.fdx, _5r5.fnm, _5r5.fdt,
_5r5_Lucene41_0.tim, _5r5_Lucene41_0.tip, _5r5_Lucene41_0.doc]
14:16:50,387 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [DWPT][recoveryExecutor-7-thread-1]: flushed
codec=Lucene410
14:16:50,387 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [DWPT][recoveryExecutor-7-thread-1]: flushed:
segment=_5r5 ramUsed=75.564 MB newFlushedSize(includes docstores)=19.546 MB
docs/MB=1,239.201
14:16:50,387 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [DWPT][recoveryExecutor-7-thread-1]: flush:
write 1 deletes gen=-1
14:16:50,388 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [DW][recoveryExecutor-7-thread-1]:
publishFlushedSegment seg-private updates=null
14:16:50,388 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [IW][recoveryExecutor-7-thread-1]:
publishFlushedSegment
14:16:50,388 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [BD][recoveryExecutor-7-thread-1]: push
deletes  24222 deleted terms (unique count=24221) bytesUsed=286752 delGen=8
packetCount=4 totBytesUsed=1259648
14:16:50,388 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [IW][recoveryExecutor-7-thread-1]: publish
sets newSegment delGen=9 seg=_5r5(4.10.0):C24222/1:delGen=1
14:16:50,388 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [IFD][recoveryExecutor-7-thread-1]: now
checkpoint "_4qe(4.10.0):C4312879/1370002:delGen=56
_554(4.10.0):C3995865/780418:delGen=23 _56u(4.10.0):C286775/11906:delGen=15
_5co(4.10.0):C871785/93841:delGen=10 _5m7(4.10.0):C122852/31675:delGen=11
_5hm(4.10.0):C457977/32535:delGen=11 _5q0(4.10.0):C13610/649:delGen=6
_5kb(4.10.0):C424868/19149:delGen=11 _5f5(4.10.0):C116528/42495:delGen=1
_5nx(4.10.0):C33236/20668:delGen=1 _5qm(4.10.0):C29770/1:delGen=1
_5o8(4.10.0):C27155/7531:delGen=1 _5of(4.10.0):C38545/5677:delGen=1
_5p7(4.10.0):C37457/648:delGen=1 _5qv(4.10.0):C3973
_5q1(4.10.0):C402/1:delGen=1 _5q2(4.10.0):C779 _5qa(4.10.0):C967
_5qc(4.10.0):C1828 _5qh(4.10.0):C1765 _5qi(4.10.0):C1241 _5qq(4.10.0):C1997
_5qr(4.10.0):C1468 _5qp(4.10.0):C1729 _5qo(4.10.0):C3456/1:delGen=1
_5qu(4.10.0):C27 _5qt(4.10.0):C30 _5qx(4.10.0):C638 _5qy(4.10.0):C1407
_5qw(4.10.0):C802 _5r2(4.10.0):C32769/1:delGen=1 _5r3(4.10.0):C26057
_5r4(4.10.0):C23934/1:delGen=1 _5r5(4.10.0):C24222/1:delGen=1" [34 segments
; isCommit = false]
14:16:50,388 INFO  [org.apache.solr.update.LoggingInf

solr cloud replicas goes in recovery mode after update

2015-01-26 Thread Vijay Sekhri
Hi Erick,

The older message seems to be deleted so I am sending a new one
http://osdir.com/ml/solr-user.lucene.apache.org/2015-01/msg00773.html


In solr.xml file I had zk timeout set to*  ${zkClientTimeout:45}*
One thing that made a it a bit better now is the zk tick time and syncLimit
settings. I set it to a higher value as below. This may not be advisable
though.

tickTime=3
initLimit=30
syncLimit=20

Now we observed that replicas do not go in recovery that often as before.
In the whole cluster at a given time I would have a couple of replicas in
recovery whereas earlier it were multiple replicas from every shard .
On the wiki https://wiki.apache.org/solr/SolrCloud it says the "The maximum
is 20 times the tickTime." in the FAQ so I decided to increase the tick
time. Is this the correct approach ?

One question I have is that if auto commit settings has anything to do with
this or not ? Does it induce extra work for the searchers because of which
this would happen? I have tried with following settings
*  *
*50*
*90*
**

**
*20*
*3*
*false*
**

I have increased  the  heap size to 15GB for each JVM instance . I
monitored during full indexing how the heap usage looks like and it never
goes beyond 8 GB .  I don't see any Full GC happening at any point .


 Our rate is a variable rate . It is not a sustained rate of 6000/second ,
however there are intervals where it would reach that much and come down
and grow again and come down.  So if I would take an average it would be
600/second only but that is not real rate at any given time.
Version of solr cloud is *4.10*.  All indexers are basically java programs
running on different host using CloudSolrServer api.
As I mentioned it is much better now than before , however not completely
as expected .If possible we would like to have none of them go in recovery

I captured some logs before and after recovery

14:16:52,774 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [IFD][recoveryExecutor-7-thread-1]: delete
"_5r2_1.del"
14:16:52,774 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [IFD][recoveryExecutor-7-thread-1]: 0 msec to
checkpoint
14:16:52,774 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [IFD][recoveryExecutor-7-thread-1]: now
checkpoint "_4qe(4.10.0):C4312879/1nts ; isCommit = false]
14:16:52,774 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [IFD][recoveryExecutor-7-thread-1]: delete
"_5r4_1.del"
14:16:52,774 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [IFD][recoveryExecutor-7-thread-1]: 0 msec to
checkpoint
14:16:52,775 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [DW][recoveryExecutor-7-thread-1]:
recoveryExecutor-7-thread-1 finishFullFlush success=true
14:16:52,775 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]:
findMerges: 34 segments
14:16:52,777 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]:
seg=_554(4.10.0):C3995865/780418:delGen=23 size=3669.307 MB [skip: too
large]
14:16:52,777 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]:
seg=_4qe(4.10.0):C4312879/1370113:delGen=57 size=3506.254 MB [skip: too
large]
14:16:52,777 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]:
seg=_5co(4.10.0):C871785/93995:delGen=11 size=853.668 MB
14:16:52,777 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]:
seg=_5kb(4.10.0):C424868/49572:delGen=12 size=518.704 MB
14:16:52,777 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]:
seg=_5hm(4.10.0):C457977/83353:delGen=12 size=470.422 MB
14:16:52,778 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]:
seg=_56u(4.10.0):C286775/11906:delGen=15 size=312.952 MB
14:16:52,778 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]:
seg=_5f5(4.10.0):C116528/43621:delGen=2 size=95.529 MB
14:16:52,778 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]:
seg=_5m7(4.10.0):C122852/54010:delGen=12 size=84.949 MB
14:16:52,778 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]:
seg=_5p7(4.10.0):C37457/649:delGen=2 size=54.241 MB
14:16:52,778 INFO  [org.apache.solr.update.LoggingInfoStream]
(recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]:
seg=_5of(4.10.0):C38545/5677:delGen=1 size=50.672 MB
14:16:

Re: Sorting on a computed value

2015-01-26 Thread Mikhail Khludnev
I'm sorry for spoiling, but it's a fabulous FakeScorer pattern in Lucene.
e.g. look at
https://github.com/apache/lucene-solr/blob/trunk/lucene/grouping/src/java/org/apache/lucene/search/grouping/BlockGroupingCollector.java#L355
when your delegating collector is provided by scorer via setScorer(), it
shouldn't just pass it to the delegate as is, but pass a FakeScorer
instance instead (copy-paste your own and make it private, absolutely!),
which your collector can set docNum and application calculated score into
before delegating collect() notification. here is.

On Tue, Jan 27, 2015 at 12:30 AM, tedsolr  wrote:

> That's an interesting link Shawn. Especially since it mentions the
> possibility of sorting on pseudo-fields.
>
> My delegating collector computes the customs stats and stores them in the
> request context. I have a doc transformer that then grabs the stats for
> each
> doc and inserts the data in the output. Here's a sample return doc:
>
>  {
> "ITEM_DESCRIPTION": "FREIGHT PAY AMT FOR ITEM - 30934014",
> "SUPPLIER_NAME": "JESUS ACOSTA MORENO",
> "GL_ACCOUNT_NAME": "-",
> "PART_NUMBER": "-",
> "MCC_CODE": "SDBHAULER.NA",
> "[AggregationStats]": {
>   "count": 1,
>   "spend": 8402.39
> }
>   },
>
> My stats are in the [AggregationStats] "field". I don't know if this
> qualifies as a pseudo-field. Sorting happens before my doc transformer is
> called, so I don't think this data is available for sort. Like you said,
> I'm
> not using a function query to create this data, so maybe this idea won't
> work.
>
> I'm going to try to use doc scoring. If I can make the score match my
> pseudo
> fields then it might work.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Sorting-on-a-computed-value-tp4181875p4182060.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: SimplePostTool with extracted Outlook messages

2015-01-26 Thread Alexandre Rafalovitch
Seems like apple to oranges comparison here.

I would try giving an explicit end point (.../extract), a single
message, and a literal id for the SimplePostTool and seeing whether
that works. Not providing an ID could definitely be an issue.

I would also specifically look on the server side in the logs and see
what the messages say to understand the discrepancies. Solr 5 is a bit
more verbose about what's going under the covers, but that's not
available yet.

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 26 January 2015 at 16:34, Mark  wrote:
> I'm looking to index some outlook extracted messages *.msg
>
> I notice by default msg isn't one of the defaults so I tried the following:
>
> java -classpath dist/solr-core-4.10.3.jar -Dtype=application/vnd.ms-outlook
> org.apache.solr.util.SimplePostTool C:/temp/samplemsg/*.msg
>
> That didn't work
>
> However curl did:
>
> curl "
> http://localhost:8983/solr/update/extract?commit=true&overwrite=true&literal.id=6252671B765A1748992DF1A6403BDF81A4A15E00";
> -F "myfile=@6252671B765A1748992DF1A6403BDF81A4A15E00.msg"
>
> My question is why does the second work and not the first?


Re: SimplePostTool with extracted Outlook messages

2015-01-26 Thread Mark
A little further

This fails

 java -classpath dist/solr-core-4.10.3.jar
-Dtype=application/vnd.ms-outlook org.apache.solr.util.SimplePostTool
C:/temp/samplemsg/*.msg

With:

SimplePostTool: WARNING: IOException while reading response:
java.io.IOException: Server returned HTTP response code: 415 for URL:
http://localhost:8983/solr/update
POSTing file 6252671B765A1748992DF1A6403BDF81A4A22C00.msg
SimplePostTool: WARNING: Solr returned an error #415 (Unsupported Media
Type) for url: http://localhost:8983/solr/update
SimplePostTool: WARNING: Response: 

4150Unsupported
ContentType: application/vnd.ms-outlook  Not in: [applicat
ion/xml, text/csv, text/json, application/csv, application/javabin,
text/xml, application/json]415


However just calling the extract works

curl "http://localhost:8983/solr/update/extract?extractOnly=true"; -F
"myfile=@6252671B765A1748992DF1A6403BDF81A4A22C00.msg"

Regards

Mark

On 26 January 2015 at 21:47, Alexandre Rafalovitch 
wrote:

> Seems like apple to oranges comparison here.
>
> I would try giving an explicit end point (.../extract), a single
> message, and a literal id for the SimplePostTool and seeing whether
> that works. Not providing an ID could definitely be an issue.
>
> I would also specifically look on the server side in the logs and see
> what the messages say to understand the discrepancies. Solr 5 is a bit
> more verbose about what's going under the covers, but that's not
> available yet.
>
> Regards,
>Alex.
> 
> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>
>
> On 26 January 2015 at 16:34, Mark  wrote:
> > I'm looking to index some outlook extracted messages *.msg
> >
> > I notice by default msg isn't one of the defaults so I tried the
> following:
> >
> > java -classpath dist/solr-core-4.10.3.jar
> -Dtype=application/vnd.ms-outlook
> > org.apache.solr.util.SimplePostTool C:/temp/samplemsg/*.msg
> >
> > That didn't work
> >
> > However curl did:
> >
> > curl "
> >
> http://localhost:8983/solr/update/extract?commit=true&overwrite=true&literal.id=6252671B765A1748992DF1A6403BDF81A4A15E00
> "
> > -F "myfile=@6252671B765A1748992DF1A6403BDF81A4A15E00.msg"
> >
> > My question is why does the second work and not the first?
>


How to implement Auto complete, suggestion client side

2015-01-26 Thread Olivier Austina
Hi All,

I would say I am new to web technology.

I would like to implement auto complete/suggestion in the user search box
as the user type in the search box (like Google for example). I am using
Solr as database. Basically I am  familiar with Solr and I can formulate
suggestion queries.

But now I don't know how to implement suggestion in the User Interface.
Which technologies should I need. The website is in PHP. Any suggestions,
examples, basic tutorial is welcome. Thank you.



Regards
Olivier


Re: SimplePostTool with extracted Outlook messages

2015-01-26 Thread Alexandre Rafalovitch
Well, you are NOT posting to the same URL.


On 26 January 2015 at 17:00, Mark  wrote:
> http://localhost:8983/solr/update




Sign up for my Solr resources newsletter at http://www.solr-start.com/


Showing distance in results

2015-01-26 Thread vit
I have Solr 4.2
I need to calculate the distance between a point (0, 0) and lat lng in each
document. I do this
http://:9081/solr/collection1/select?q={!func}dist(2, lat, lng, 0,
0)&wt=xml&indent=true

It works fine but does not show the distance, Please help.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Showing-distance-in-results-tp4182077.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: replicas goes in recovery mode right after update

2015-01-26 Thread Erick Erickson
 [BD][RecoveryThread]: applyDeletes: no deletes; skipping
> 14:13:54,426 INFO  [org.apache.solr.update.LoggingInfoStream]
> (RecoveryThread) [BD][RecoveryThread]: prune sis=segments_9t:
> _4qe(4.10.0):C4312879/1370002
>
>
> lrx334p.qa.ch3.s.com:8680/solr/search1_shard7_replica11/&update.distrib=FROMLEADER&wt=javabin&version=2&update.chain=script&update.chain=removeDuplicates
> }
> status=0 QTime=0
> 14:16:49,279 INFO  [org.apache.solr.core.SolrCore] (http-/10.235.47.41:8580
> -1)
> [search1_shard7_replica4] webapp=/solr path=/update params={distrib.from=
>
> http://solrx334p.qa.ch3.s.com:8680/solr/search1_shard7_replica11/&update.distrib=FROMLEADER&wt=javabin&version=2&update.chain=script&update.chain=removeDuplicates
> }
> status=0 QTime=0
> 14:16:49,283 INFO  [org.apache.solr.update.UpdateHandler]
> (recoveryExecutor-7-thread-1) start
>
> commit{flags=2,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> 14:16:49,283 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [IW][recoveryExecutor-7-thread-1]: commit:
> start
> 14:16:49,283 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [IW][recoveryExecutor-7-thread-1]: commit:
> enter lock
> 14:16:49,283 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [IW][recoveryExecutor-7-thread-1]: commit:
> now prepare
> 14:16:49,283 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [IW][recoveryExecutor-7-thread-1]:
> prepareCommit: flush
> 14:16:49,283 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [IW][recoveryExecutor-7-thread-1]:   index
> before flush _4qe(4.10.0):C4312879/1370002:delGen=56
> _554(4.10.0):C3995865/780418:delGen=23 _56u(4.10.0):C286775/11906:delGen=15
> _5co(4.10.0):C871785/93841:delGen=10 _5m7(4.10.0):C122852/31675:delGen=11
> _5hm(4.10.0):C457977/32535:delGen=11 _5q0(4.10.0):C13610/649:delGen=6
> _5kb(4.10.0):C424868/19149:delGen=11 _5f5(4.10.0):C116528/42495:delGen=1
> _5nx(4.10.0):C33236/20668:delGen=1 _5qm(4.10.0):C29770/1:delGen=1
> _5o8(4.10.0):C27155/7531:delGen=1 _5of(4.10.0):C38545/5677:delGen=1
> _5p7(4.10.0):C37457/648:delGen=1 _5qv(4.10.0):C3973
> _5q1(4.10.0):C402/1:delGen=1 _5q2(4.10.0):C779 _5qa(4.10.0):C967
> _5qc(4.10.0):C1828 _5qh(4.10.0):C1765 _5qi(4.10.0):C1241 _5qq(4.10.0):C1997
> _5qr(4.10.0):C1468 _5qp(4.10.0):C1729 _5qo(4.10.0):C3456/1:delGen=1
> _5qu(4.10.0):C27 _5qt(4.10.0):C30 _5qx(4.10.0):C638 _5qy(4.10.0):C1407
> _5qw(4.10.0):C802 _5r2(4.10.0):C32769/1:delGen=1 _5r3(4.10.0):C26057
> _5r4(4.10.0):C23934/1:delGen=1
> 14:16:49,283 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [DW][recoveryExecutor-7-thread-1]:
> startFullFlush
> 14:16:49,284 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [DW][recoveryExecutor-7-thread-1]:
> anyChanges? numDocsInRam=24222 deletes=true hasTickets:false
> pendingChangesInFullFlush: false
> 14:16:49,284 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [DWFC][recoveryExecutor-7-thread-1]:
> addFlushableState DocumentsWriterPerThread [pendingDeletes=gen=0 24221
> deleted terms (unique count=24220) bytesUsed=4455794, segment=_5r5,
> aborting=false, numDocsInRAM=24222, deleteQueue=DWDQ: [ generation: 1 ]]
> 14:16:49,322 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [DWPT][recoveryExecutor-7-thread-1]: flush
> postings as segment _5r5 numDocs=24222
>
> ==> gc.20150126-135638.log <==
> 1211.362: [GC1211.362: [ParNew: 966947K->88429K(996800K), 0.0191260 secs]
> 1499845K->633603K(1995752K), 0.0192710 secs] [Times: user=0.20 sys=0.00,
> real=0.02 secs]
>
> ==> server.log <==
> 14:16:50,387 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [DWPT][recoveryExecutor-7-thread-1]: new
> segment has 1 deleted docs
> 14:16:50,387 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [DWPT][recoveryExecutor-7-thread-1]: new
> segment has no vectors; no norms; no docValues; prox; freqs
> 14:16:50,387 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [DWPT][recoveryExecutor-7-thread-1]:
> flushedFiles=[_5r5_Lucene41_0.pos, _5r5.fdx, _5r5.fnm, _5r5.fdt,
> _5r5_Lucene41_0.tim, _5r5_Lucene41_0.tip, _5r5_Lucene41_0.doc]
> 14:16:50,387 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [DWPT][recoveryExecutor-7-thread-1]: flushed
> codec=Lucene410
> 14:16:50,387 INFO  [org.apache.solr.update.LoggingInfoStream]
> (reco

Re: SimplePostTool with extracted Outlook messages

2015-01-26 Thread Mark
Fantastic - that explians it

Adding -Durl="
http://localhost:8983/solr/update/extract?commit=true&overwrite=true";

Get's me a little further

POSTing file 6252671B765A1748992DF1A6403BDF81A4A22E00.msg
SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for url:
http://localhost:8983/solr/update/extract?commit=true&overwrite=true
SimplePostTool: WARNING: Response: 

40044Document is
missing mandatory uniqueKey field: id400


However not much use when recursing a directory and the URL essentially has
to change to pass the document ID

I think I may just extend SimplePostToo or look to use Solr Cell perhaps?



On 26 January 2015 at 22:14, Alexandre Rafalovitch 
wrote:

> Well, you are NOT posting to the same URL.
>
>
> On 26 January 2015 at 17:00, Mark  wrote:
> > http://localhost:8983/solr/update
>
>
>
> 
> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>


Re: Showing distance in results

2015-01-26 Thread Erick Erickson
A very small bit of Googling yeilds:
https://wiki.apache.org/solr/SpatialSearch

Best,
Erick

On Mon, Jan 26, 2015 at 2:18 PM, vit  wrote:

> I have Solr 4.2
> I need to calculate the distance between a point (0, 0) and lat lng in each
> document. I do this
> http://:9081/solr/collection1/select?q={!func}dist(2, lat, lng, 0,
> 0)&wt=xml&indent=true
>
> It works fine but does not show the distance, Please help.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Showing-distance-in-results-tp4182077.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: solr cloud replicas goes in recovery mode after update

2015-01-26 Thread Mark Miller
bq. Is this the correct approach ?

It works, but it might not be ideal. Recent versions of ZooKeeper have an
alternate config for this max limit though, and it is preferable to use
that.

See maxSessionTimeout in
http://zookeeper.apache.org/doc/r3.3.1/zookeeperAdmin.html

- Mark

On Mon Jan 26 2015 at 4:44:41 PM Vijay Sekhri  wrote:

> Hi Erick,
>
> The older message seems to be deleted so I am sending a new one
> http://osdir.com/ml/solr-user.lucene.apache.org/2015-01/msg00773.html
>
>
> In solr.xml file I had zk timeout set to*   name="zkClientTimeout">${zkClientTimeout:45}*
> One thing that made a it a bit better now is the zk tick time and syncLimit
> settings. I set it to a higher value as below. This may not be advisable
> though.
>
> tickTime=3
> initLimit=30
> syncLimit=20
>
> Now we observed that replicas do not go in recovery that often as before.
> In the whole cluster at a given time I would have a couple of replicas in
> recovery whereas earlier it were multiple replicas from every shard .
> On the wiki https://wiki.apache.org/solr/SolrCloud it says the "The
> maximum
> is 20 times the tickTime." in the FAQ so I decided to increase the tick
> time. Is this the correct approach ?
>
> One question I have is that if auto commit settings has anything to do with
> this or not ? Does it induce extra work for the searchers because of which
> this would happen? I have tried with following settings
> *  *
> *50*
> *90*
> **
>
> **
> *20*
> *3*
> *false*
> **
>
> I have increased  the  heap size to 15GB for each JVM instance . I
> monitored during full indexing how the heap usage looks like and it never
> goes beyond 8 GB .  I don't see any Full GC happening at any point .
>
>
>  Our rate is a variable rate . It is not a sustained rate of 6000/second ,
> however there are intervals where it would reach that much and come down
> and grow again and come down.  So if I would take an average it would be
> 600/second only but that is not real rate at any given time.
> Version of solr cloud is *4.10*.  All indexers are basically java programs
> running on different host using CloudSolrServer api.
> As I mentioned it is much better now than before , however not completely
> as expected .If possible we would like to have none of them go in recovery
>
> I captured some logs before and after recovery
>
> 14:16:52,774 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [IFD][recoveryExecutor-7-thread-1]: delete
> "_5r2_1.del"
> 14:16:52,774 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [IFD][recoveryExecutor-7-thread-1]: 0 msec
> to
> checkpoint
> 14:16:52,774 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [IFD][recoveryExecutor-7-thread-1]: now
> checkpoint "_4qe(4.10.0):C4312879/1nts ; isCommit = false]
> 14:16:52,774 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [IFD][recoveryExecutor-7-thread-1]: delete
> "_5r4_1.del"
> 14:16:52,774 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [IFD][recoveryExecutor-7-thread-1]: 0 msec
> to
> checkpoint
> 14:16:52,775 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [DW][recoveryExecutor-7-thread-1]:
> recoveryExecutor-7-thread-1 finishFullFlush success=true
> 14:16:52,775 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]:
> findMerges: 34 segments
> 14:16:52,777 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]:
> seg=_554(4.10.0):C3995865/780418:delGen=23 size=3669.307 MB [skip: too
> large]
> 14:16:52,777 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]:
> seg=_4qe(4.10.0):C4312879/1370113:delGen=57 size=3506.254 MB [skip: too
> large]
> 14:16:52,777 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]:
> seg=_5co(4.10.0):C871785/93995:delGen=11 size=853.668 MB
> 14:16:52,777 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]:
> seg=_5kb(4.10.0):C424868/49572:delGen=12 size=518.704 MB
> 14:16:52,777 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]:
> seg=_5hm(4.10.0):C457977/83353:delGen=12 size=470.422 MB
> 14:16:52,778 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]:
> seg=_56u(4.10.0):C286775/11906:delGen=15 size=312.952 MB
> 14:16:52,778 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]:
> seg=_5f5(4.10.0):C116528/43621:delGen=2 size=95.529 MB
> 14

Re: replicas goes in recovery mode right after update

2015-01-26 Thread Shawn Heisey
On 1/26/2015 2:26 PM, Vijay Sekhri wrote:
> Hi Erick,
> In solr.xml file I had zk timeout set to/  name="zkClientTimeout">${zkClientTimeout:45}/
> One thing that made a it a bit better now is the zk tick time and
> syncLimit settings. I set it to a higher value as below. This may not
> be advisable though. 
>
> tickTime=3
> initLimit=30
> syncLimit=20
>
> Now we observed that replicas do not go in recovery that often as
> before. In the whole cluster at a given time I would have a couple of
> replicas in recovery whereas earlier it were multiple replicas from
> every shard .
> On the wiki https://wiki.apache.org/solr/SolrCloudit says the "The
> maximum is 20 times the tickTime." in the FAQ so I decided to increase
> the tick time. Is this the correct approach ?

The default zkClientTimeout on recent Solr versions is 30 seconds, up
from 15 in slightly older releases.

Those values of 15 or 30 seconds are a REALLY long time in computer
terms, and if you are exceeding that timeout on a regular basis,
something is VERY wrong with your Solr install.  Rather than take steps
to increase your timeout beyond the normal maximum of 40 seconds (20
times a tickTime of 2 seconds), figure out why you're exceeding that
timeout and fix the performance problem.  The zkClientTimeout value that
you have set, 450 seconds, is seven and a half *MINUTES*.  Nothing in
Solr should ever take that long.

"Not enough memory in the server" is by far the most common culprit for
performance issues.  Garbage collection pauses are a close second.

I don't actually know this next part for sure, because I've never looked
into the code, but I believe that increasing the tickTime, especially to
a value 15 times higher than default, might make all zookeeper
operations a lot slower.

Thanks,
Shawn



Re: How to implement Auto complete, suggestion client side

2015-01-26 Thread Alexandre Rafalovitch
You've got a lot of options depending on what you want. But since you
seem to just want _an_ example, you can use mine from
http://www.solr-start.com/javadoc/solr-lucene/index.html (gray search
box there).

You can see the source for the test screen (using Spring Boot and
Spring Data Solr as a middle-layer) and Select2 for the UI at:
https://github.com/arafalov/Solr-Javadoc/tree/master/SearchServer.
The Solr definition is at:
https://github.com/arafalov/Solr-Javadoc/tree/master/JavadocIndex/JavadocCollection/conf

Other implementation pieces are in that (and another) public
repository as well, but it's all in Java. You'll probably want to do
something similar in PHP.

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 26 January 2015 at 17:11, Olivier Austina  wrote:
> Hi All,
>
> I would say I am new to web technology.
>
> I would like to implement auto complete/suggestion in the user search box
> as the user type in the search box (like Google for example). I am using
> Solr as database. Basically I am  familiar with Solr and I can formulate
> suggestion queries.
>
> But now I don't know how to implement suggestion in the User Interface.
> Which technologies should I need. The website is in PHP. Any suggestions,
> examples, basic tutorial is welcome. Thank you.
>
>
>
> Regards
> Olivier


An interesting approach to grouping

2015-01-26 Thread Ryan Josal
I have an index of products, and these products have a "category" which we
can say for now is a good approximation of its location in the store.  I'm
investigating altering the ordering of the results so that the categories
aren't interlaced as much... so that the results are a little bit more
grouped by category, but not *totally* grouped by category.  It's
interesting because it's an approach that sort of compares results to
near-scored/ranked results.  One of the hoped outcomes of this would that
there would be somewhat fewer categories represented in the top results for
a given query, although it is questionable if this is a good measurement to
determine the effectiveness of the implementation.

My first attempt was to
group=true&group.main=true&group.field=category&group.func=rint(scale(query({!type=edismax
v=$q}),0,20))

Or some FunctionQuery like that, so that in order to become a member of a
group, the doc would have to have the same category, and be dropped into
the same score bucket (20 in this case).  This doesn't work out of the gate
due to an NPE (solr 4.10.2) (although I'm not sure it would work anyway):

java.lang.NullPointerException\n\tat
org.apache.lucene.queries.function.valuesource.ScaleFloatFunction.getValues(ScaleFloatFunction.java:104)\n\tat
org.apache.solr.search.DoubleParser$Function.getValues(ValueSourceParser.java:)\n\tat
org.apache.lucene.search.grouping.function.FunctionFirstPassGroupingCollector.setNextReader(FunctionFirstPassGroupingCollector.java:82)\n\tat
org.apache.lucene.search.MultiCollector.setNextReader(MultiCollector.java:113)\n\tat
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612)\n\tat
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)\n\tat
org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:451)\n\tat
org.apache.solr.search.Grouping.execute(Grouping.java:368)\n\tat
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:459)\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)\n\tat


Has anyone tried something like this before, and does anyone have any novel
ideas for how to approach it, no matter how different?  How about a
workaround for the group.func error here?  I'm very open-minded about where
to go on this one.

Thanks,
Ryan


Re: How to implement Auto complete, suggestion client side

2015-01-26 Thread Dan Davis
Cannot get any easier than jquery-ui's autocomplete widget -
http://jqueryui.com/autocomplete/

Basically, you set some classes and implement a javascript that calls the
server to get the autocomplete data.   I never would expose Solr to
browsers, so I would have the AJAX call go to a php script (or
function/method if you are using a web framework such as CakePHP or
Symfony).

Then, on the server, you make a request to Solr /suggest or /spell with
wt=json, and then you reformulate this into a simple JSON response that is
a simple array of options.

You can do this in stages:

   - Constant suggestions - you change your html and implement Javascript
   that shows constant suggestions after for instance 2 seconds.
   - Constant suggestions from the server - you change your JavaScript to
   call the server, and have the server return a constant list.
   - Dynamic suggestions from the server - you implement the server-side to
   query Solr and turn the return from /suggest or /spell into a JSON array.
   - Tuning, tuning, tuning - you work hard on tuning it so that you get
   high quality suggestions for a wide variety of inputs.

Note that the autocomplete I've described for you is basically the simplest
thing possible, as you suggest you are new to it.   It is not based on data
mining of query and click-through logs, which is a very common pattern
these days.   There is no bolding of the portion of the words that are new.
  It is just a basic autocomplete widget with a delay.

On Mon, Jan 26, 2015 at 5:11 PM, Olivier Austina 
wrote:

> Hi All,
>
> I would say I am new to web technology.
>
> I would like to implement auto complete/suggestion in the user search box
> as the user type in the search box (like Google for example). I am using
> Solr as database. Basically I am  familiar with Solr and I can formulate
> suggestion queries.
>
> But now I don't know how to implement suggestion in the User Interface.
> Which technologies should I need. The website is in PHP. Any suggestions,
> examples, basic tutorial is welcome. Thank you.
>
>
>
> Regards
> Olivier
>


Solr admin Url issues

2015-01-26 Thread Summer Shire
Hi All,

Running solr (4.7.2) locally and hitting the admin page like this works just 
fine http://localhost:8983/solr/ # 
 

But on my deployment server my path is http://example.org/jetty/MyApp/1/solr/# 

Or http://example.org/jetty/MyApp/1/solr/admin/cores 
 or  
http://example.org/jetty/MyApp/1/solr/main/admin/ 


the above request in a browser loads the admin page half way and then spawns 
another request at
http://example.org/solr/admin/cores ….

how can I maintain my other params such as jetty/MyApp/1/

btw http://example.org/jetty/MyApp/1/solr/main/select?q=*:* 
 or any other 
requesthandlers work just fine.
 
What is going on here ? any idea ?

thanks,
Summer

Re: Solr admin Url issues

2015-01-26 Thread Dan Davis
Is Jetty actually running on port 80?Do you have Apache2 reverse proxy
in front?

On Mon, Jan 26, 2015 at 11:02 PM, Summer Shire 
wrote:

> Hi All,
>
> Running solr (4.7.2) locally and hitting the admin page like this works
> just fine http://localhost:8983/solr/ # <
> http://localhost:8983/solr/#>
>
> But on my deployment server my path is
> http://example.org/jetty/MyApp/1/solr/# <
> http://example.org/jetty/MyApp/1/solr/#>
> Or http://example.org/jetty/MyApp/1/solr/admin/cores <
> http://example.org/jetty/MyApp/1/solr/admin/cores> or
> http://example.org/jetty/MyApp/1/solr/main/admin/ <
> http://example.org/jetty/MyApp/1/solr/main/admin/>
>
> the above request in a browser loads the admin page half way and then
> spawns another request at
> http://example.org/solr/admin/cores  >….
>
> how can I maintain my other params such as jetty/MyApp/1/
>
> btw http://example.org/jetty/MyApp/1/solr/main/select?q=*:* <
> http://example.org/jetty/MyApp/1/solr/main/select?q=*:*> or any other
> requesthandlers work just fine.
>
> What is going on here ? any idea ?
>
> thanks,
> Summer


Re: Solr admin Url issues

2015-01-26 Thread Summer Shire
jetty is not running on port 80
it is running on ports that I defined for my instances in sequence.
and no I do not have apache2 reverse proxy in front :(



> On Jan 26, 2015, at 8:18 PM, Dan Davis  wrote:
> 
> Is Jetty actually running on port 80?Do you have Apache2 reverse proxy
> in front?
> 
> On Mon, Jan 26, 2015 at 11:02 PM, Summer Shire 
> wrote:
> 
>> Hi All,
>> 
>> Running solr (4.7.2) locally and hitting the admin page like this works
>> just fine http://localhost:8983/solr/ # <
>> http://localhost:8983/solr/#>
>> 
>> But on my deployment server my path is
>> http://example.org/jetty/MyApp/1/solr/# <
>> http://example.org/jetty/MyApp/1/solr/#>
>> Or http://example.org/jetty/MyApp/1/solr/admin/cores <
>> http://example.org/jetty/MyApp/1/solr/admin/cores> or
>> http://example.org/jetty/MyApp/1/solr/main/admin/ <
>> http://example.org/jetty/MyApp/1/solr/main/admin/>
>> 
>> the above request in a browser loads the admin page half way and then
>> spawns another request at
>> http://example.org/solr/admin/cores >> ….
>> 
>> how can I maintain my other params such as jetty/MyApp/1/
>> 
>> btw http://example.org/jetty/MyApp/1/solr/main/select?q=*:* <
>> http://example.org/jetty/MyApp/1/solr/main/select?q=*:*> or any other
>> requesthandlers work just fine.
>> 
>> What is going on here ? any idea ?
>> 
>> thanks,
>> Summer



Re: replicas goes in recovery mode right after update

2015-01-26 Thread Vijay Sekhri
Hi Shawn, Erick
So it turned out that once we increased our indexing rate to the original
full indexing rate  the replicas went back into recovery no matter what the
zk timeout setting was. Initially we though that increasing the timeout is
helping but apparently not . We just decreased indexing rate and that
caused less replicas to go in recovery. Once we have our full indexing rate
almost all replicas went into recovery no matter what the zk timeout or the
ticktime setting were. We reverted back the ticktime to original 2 seconds

So we investigated further and after checking the logs we found this
exception happening right before the recovery process is initiated. We
observed this on two different replicas that went into recovery. We are not
sure if this is a coincidence or a real problem . Notice we were also
putting some search query load while indexing to trigger the recovery
behavior

22:00:32,493 INFO  [org.apache.solr.cloud.RecoveryStrategy]
(rRecoveryThread) Finished recovery process. core=search1_shard5_replica2
22:00:32,503 INFO  [org.apache.solr.common.cloud.ZkStateReader]
(zkCallback-2-thread-66) A cluster state change: WatchedEvent
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has
occurred - updating... (live nodes size: 22)
22:00:40,450 INFO  [org.apache.solr.update.LoggingInfoStream]
(http-/10.235.46.36:8580-27) [FP][http-/10.235.46.36:8580-27]: trigger
flush: activeBytes=101796784 deleteBytes=3061644 vs limit=104857600
22:00:40,450 INFO  [org.apache.solr.update.LoggingInfoStream]
(http-/10.235.46.36:8580-27) [FP][http-/10.235.46.36:8580-27]: thread state
has 12530488 bytes; docInRAM=2051
22:00:40,450 INFO  [org.apache.solr.update.LoggingInfoStream]
(http-/10.235.46.36:8580-27) [FP][http-/10.235.46.36:8580-27]: thread state
has 12984633 bytes; docInRAM=2205


22:00:40,861 ERROR [org.apache.solr.core.SolrCore] (http-/10.235.46.36:8580-32)
ClientAbortException: * java.io.IOException: JBWEB002020: Invalid chunk
header*
at
org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:351)
at
org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:422)
at
org.apache.catalina.connector.InputBuffer.read(InputBuffer.java:373)
at
org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.java:193)
at
org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:80)
at
org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:89)
at
org.apache.solr.common.util.FastInputStream.readByte(FastInputStream.java:192)
at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:111)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:173)
at
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:106)
at
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:99)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:230)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:149)
at
org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:169)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:145)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:97)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:559)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:102)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:336)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:856)
  at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:920)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.IOException: JBWEB002020: Invalid chunk header
at
org.apache.coyote.http11.filters.ChunkedInputFilter.parseChunkHeader(ChunkedI

Re: replicas goes in recovery mode right after update

2015-01-26 Thread Vijay Sekhri
Hi Shawn, Erick
>From another replicas right after the same error it seems the leader
initiates the recovery of the replicas. This one has a bit different log
information than the other one that went into recovery. I am not sure if
this helps in diagnosing

Caused by: java.io.IOException: JBWEB002020: Invalid chunk header
at
org.apache.coyote.http11.filters.ChunkedInputFilter.parseChunkHeader(ChunkedInputFilter.java:281)
at
org.apache.coyote.http11.filters.ChunkedInputFilter.doRead(ChunkedInputFilter.java:134)
at
org.apache.coyote.http11.InternalInputBuffer.doRead(InternalInputBuffer.java:697)
at org.apache.coyote.Request.doRead(Request.java:438)
at
org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:341)
... 31 more

21:55:07,678 INFO  [org.apache.solr.handler.admin.CoreAdminHandler]
(http-/10.235.43.57:8680-32) It has been requested that we recover:
core=search1_shard4_replica13
21:55:07,678 INFO  [org.apache.solr.servlet.SolrDispatchFilter]
(http-/10.235.43.57:8680-32) [admin] webapp=null path=/admin/cores
params={action=REQUESTRECOVERY&core=search1_shard4_replica13&wt=javabin&version=2}
status=0 QTime=0
21:55:07,678 INFO  [org.apache.solr.cloud.ZkController] (Thread-443)
publishing core=search1_shard4_replica13 state=recovering collection=search1
21:55:07,678 INFO  [org.apache.solr.cloud.ZkController] (Thread-443)
numShards not found on descriptor - reading it from system property
21:55:07,681 INFO  [org.apache.solr.cloud.ZkController] (Thread-443) Wrote
recovering to /collections/search1/leader_initiated_recovery
/shard4/core_node192


On Mon, Jan 26, 2015 at 10:34 PM, Vijay Sekhri 
wrote:

> Hi Shawn, Erick
> So it turned out that once we increased our indexing rate to the original
> full indexing rate  the replicas went back into recovery no matter what the
> zk timeout setting was. Initially we though that increasing the timeout is
> helping but apparently not . We just decreased indexing rate and that
> caused less replicas to go in recovery. Once we have our full indexing rate
> almost all replicas went into recovery no matter what the zk timeout or the
> ticktime setting were. We reverted back the ticktime to original 2 seconds
>
> So we investigated further and after checking the logs we found this
> exception happening right before the recovery process is initiated. We
> observed this on two different replicas that went into recovery. We are not
> sure if this is a coincidence or a real problem . Notice we were also
> putting some search query load while indexing to trigger the recovery
> behavior
>
> 22:00:32,493 INFO  [org.apache.solr.cloud.RecoveryStrategy]
> (rRecoveryThread) Finished recovery process. core=search1_shard5_replica2
> 22:00:32,503 INFO  [org.apache.solr.common.cloud.ZkStateReader]
> (zkCallback-2-thread-66) A cluster state change: WatchedEvent
> state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has
> occurred - updating... (live nodes size: 22)
> 22:00:40,450 INFO  [org.apache.solr.update.LoggingInfoStream]
> (http-/10.235.46.36:8580-27) [FP][http-/10.235.46.36:8580-27]: trigger
> flush: activeBytes=101796784 deleteBytes=3061644 vs limit=104857600
> 22:00:40,450 INFO  [org.apache.solr.update.LoggingInfoStream]
> (http-/10.235.46.36:8580-27) [FP][http-/10.235.46.36:8580-27]: thread
> state has 12530488 bytes; docInRAM=2051
> 22:00:40,450 INFO  [org.apache.solr.update.LoggingInfoStream]
> (http-/10.235.46.36:8580-27) [FP][http-/10.235.46.36:8580-27]: thread
> state has 12984633 bytes; docInRAM=2205
>
>
> 22:00:40,861 ERROR [org.apache.solr.core.SolrCore] 
> (http-/10.235.46.36:8580-32)
> ClientAbortException: * java.io.IOException: JBWEB002020: Invalid chunk
> header*
> at
> org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:351)
> at
> org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:422)
> at
> org.apache.catalina.connector.InputBuffer.read(InputBuffer.java:373)
> at
> org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.java:193)
> at
> org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:80)
> at
> org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:89)
> at
> org.apache.solr.common.util.FastInputStream.readByte(FastInputStream.java:192)
> at
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:111)
> at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:173)
> at
> org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:106)
> at
> org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
> at
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:99)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.j

Re: Solr admin Url issues

2015-01-26 Thread Shawn Heisey
On 1/26/2015 9:02 PM, Summer Shire wrote:
> Running solr (4.7.2) locally and hitting the admin page like this works just 
> fine http://localhost:8983/solr/ # 
>  
> 
> But on my deployment server my path is 
> http://example.org/jetty/MyApp/1/solr/# 
> 
> Or http://example.org/jetty/MyApp/1/solr/admin/cores 
>  or  
> http://example.org/jetty/MyApp/1/solr/main/admin/ 
> 
> 
> the above request in a browser loads the admin page half way and then spawns 
> another request at
> http://example.org/solr/admin/cores ….
> 
> how can I maintain my other params such as jetty/MyApp/1/
> 
> btw http://example.org/jetty/MyApp/1/solr/main/select?q=*:* 
>  or any other 
> requesthandlers work just fine.

Is this the scenario that got mentioned on the IRC channel?  That was
indicated to be behind a proxy.

If Solr is behind a proxy that changes the URL path (which is the only
way I can imagine this is working on port 80), the proxy must also
rewrite the URLs that the Solr admin UI sends to the user's browser.
Those URLs are embedded in the data (definitely javascript, and possibly
html) that the admin UI sends to the user's browser, and are created by
information available to Solr.  Solr will have no idea what the path on
the proxy is.

Thanks,
Shawn



Re: replicas goes in recovery mode right after update

2015-01-26 Thread Shawn Heisey
On 1/26/2015 9:34 PM, Vijay Sekhri wrote:
> Hi Shawn, Erick
> So it turned out that once we increased our indexing rate to the original
> full indexing rate  the replicas went back into recovery no matter what the
> zk timeout setting was. Initially we though that increasing the timeout is
> helping but apparently not . We just decreased indexing rate and that
> caused less replicas to go in recovery. Once we have our full indexing rate
> almost all replicas went into recovery no matter what the zk timeout or the
> ticktime setting were. We reverted back the ticktime to original 2 seconds
> 
> So we investigated further and after checking the logs we found this
> exception happening right before the recovery process is initiated. We
> observed this on two different replicas that went into recovery. We are not
> sure if this is a coincidence or a real problem . Notice we were also
> putting some search query load while indexing to trigger the recovery
> behavior



> 22:00:40,861 ERROR [org.apache.solr.core.SolrCore] 
> (http-/10.235.46.36:8580-32)
> ClientAbortException: * java.io.IOException: JBWEB002020: Invalid chunk
> header*

One possibility that my searches on that exception turned up is that
this is some kind of a problem in the servlet container, and the
information I can see suggests it may be a bug in JBoss, and the
underlying cause is changes in newer releases of Java 7.  Your
stacktraces do seem to mention jboss classes, so that seems likely.  The
reason that we only recommend running under the Jetty that comes with
Solr, which has a tuned config, is because that's the only servlet
container that actually gets tested.

https://bugzilla.redhat.com/show_bug.cgi?id=1104273
https://bugzilla.redhat.com/show_bug.cgi?id=1154028

I can't really verify any other possibility.

Thanks,
Shawn



Want multiple df field on suggestion component.

2015-01-26 Thread Nitin Solanki
Hi,
   I have created 2 fields "ngram and count".
ngram => Stores 1 to 5 grams of words or phrase.
count => Stores frequency of each ngram.

Applying df field on ngram in Suggestion component. When I get the
suggestion of misspell word, word and freq are coming but now I need to use
count field too into suggestion block.
As I know, I defined df on ngram due to this suggestion is working and
coming. But Now, I need to use count field in the suggestion. Is there any
way to define count as df field too along with ngram. Or there is any other
method to do the same.