Re: Separate logging for Solr updatereuesthandler

2014-11-20 Thread Shawn Heisey
On 11/20/2014 10:51 PM, solr2020 wrote: > we have a Solr(solr4.2) setup with Jetty web server and the events are > logged using log4j with the log level INFO.But here we would like to get > more details about the update request received by UpdateRequestHandler. So > is there anyway to configure deb

Separate logging for Solr updatereuesthandler

2014-11-20 Thread solr2020
Hi , we have a Solr(solr4.2) setup with Jetty web server and the events are logged using log4j with the log level INFO.But here we would like to get more details about the update request received by UpdateRequestHandler. So is there anyway to configure debug log level kind of stuff for Update requ

Re: Multiple facet.query ignored

2014-11-20 Thread Erick Erickson
This is totally weird. What version of Solr? Because the strangest thing is that the facet.query clause is getting lost on the way _in_. When I tried this query (5x, I'll admit), at least the echo params at the top had both facet queries, even when the lastsaveddate was undefined! In fact if I put

Re: Include Solr score into a ranking algorithm

2014-11-20 Thread Nicholas Ding
Thank you so much, Mikhail! It works perfectly. On Thu, Nov 20, 2014 at 12:54 PM, Mikhail Khludnev < mkhlud...@griddynamics.com> wrote: > On Thu, Nov 20, 2014 at 5:23 PM, Nicholas Ding > wrote: > > > Hi Mikhail, > > > > Thank you very much! I'm using eDisMax by default, I think I will need to >

Re: [ANN] word2vec for Lucene

2014-11-20 Thread Koji Sekiguchi
Hi Joseph, Thank you for asking. If you want to do it in the interactive sense, it won't work well practically because it takes several minutes for learning. If you accept working in batch sense, the feature can be implemented, but I've not done it yet. I have the open ticket for that: accept f

Multiple facet.query ignored

2014-11-20 Thread nbosecker
Hi, I'm having problems with queries that have multiple facet.query fields. Per the docs: [http://wiki.apache.org/solr/SimpleFacetParameters#Facet_Fields_and_Facet_Queries] http://localhost:8983/solr/select?q=video&rows=0&facet=true&facet.field=inStock&facet.query=price:[*+TO+500]&facet.query=pr

Re: [ANN] word2vec for Lucene

2014-11-20 Thread Koji Sekiguchi
Thanks Glen for the URL. I'd like to check it when I am available. Thanks Paul for giving me the difference between them. I like your description! Koji (2014/11/21 2:18), Paul Libbrecht wrote: > As far as I could tell, word2vec seems more mathematical, which is rather > nice. > At least I see m

Re: [ANN] word2vec for Lucene

2014-11-20 Thread Joseph Obernberger
Hi Koji - is it possible to execute word2vec on a subset of documents from Solr? - ie could I run a query, get back the top n results and pass only those to word2vec? Will this work with Solr Cloud? Thank you! -Joe On Thu, Nov 20, 2014 at 12:18 PM, Paul Libbrecht wrote: > As far as I could t

solr 3: multivalued field omitTermFreqAndPositions is ignored

2014-11-20 Thread qrcde
Hello, We have 20mil document solr index which contains multiValued fields of numbers. We have omitNorms="true" omitTermFreqAndPositions="true" , but looks like solr still calculating idf and tf for the field. Is there any other way except creating custom similarity to fix this issue? I just want

Can SOLR map query terms one-to-one with matched terms?

2014-11-20 Thread Matthew Gwynne
Hi, I am currently working on a people search tool using SOLR to facilitate the indexing + fuzzy search across multiple fields (with edismax), using various filters such as SynonymFilterFactory, WordDelimiterFactory etc and disabling TF-IDF. This works very well, except for a few cases where a s

Re: Include Solr score into a ranking algorithm

2014-11-20 Thread Mikhail Khludnev
On Thu, Nov 20, 2014 at 5:23 PM, Nicholas Ding wrote: > Hi Mikhail, > > Thank you very much! I'm using eDisMax by default, I think I will need to > change it to defType=func and I wonder why do you ask, because the given link has three examples of including edismax into the simple calculation.

Re: [ANN] word2vec for Lucene

2014-11-20 Thread Paul Libbrecht
As far as I could tell, word2vec seems more mathematical, which is rather nice. At least I see more transparent math in the web-page. Maybe this helps a bit? SemanticVectors has always rather pleasant for the LSI/LSA-like approach, but precisely this is mathematically opaque. Maybe it's more a q

Re: Handling growth

2014-11-20 Thread Erick Erickson
Oversharding is another option that punts the ball further down the road, but 5 years from now somebody _else_ will have to deal with it ;)... You can host multiple shards on a single Solr. So say you think you'll need 20 shards in 5 years (or whatever). Start with 20 shards on your single machine

Re: SOLR not starting after restart 2 node cloud setup

2014-11-20 Thread Erick Erickson
Doss: Tomcat often puts things in "catalina.out", you might check there, I've often seen logging information from Solr go there by default. Without having some idea what kinds of problems Solr is reporting when you see this situation, it's really hard to say. Some things I'd check first though,

elevate.xml not getting updated after refreshing the solr admin page in solr cloud env.

2014-11-20 Thread rahulmodi
Hi All, I want to use Boost feature by using elevate.xml file. It works perfectly on local system because on local there is no cloud server or zookeeper. Here when i update elevate.xml file and refresh the solr admin page it instantly reflect the changes in this file and after taking reload of par

solr cloud config files not getting updated even without restarting server

2014-11-20 Thread rahulmodi
Hi All, I want to use Boost feature by using elevate.xml file. It works perfectly on local system because on local there is no cloud server or zookeeper. Here when i update elevate.xml file and refresh the solr admin page it instantly reflect the changes in this file and after taking reload of par

Re: Handling growth

2014-11-20 Thread Michael Della Bitta
The collections we index under this multi-collection alias does not use real time get, no. We have other collections behind single-collection aliases where get calls seem to work, but I'm not clear whether the calls are real time. Seems like it would be easy for you to test, but just be aware t

Re: [ANN] word2vec for Lucene

2014-11-20 Thread Glen Newton
Hi Koji, Semantic vectors is here: http://code.google.com/p/semanticvectors/ It is a project that has been around for a number of years and used by many people (including me http://zzzoot.blogspot.com/2009/07/project-torngat-building-large-scale.html ). If you could compare and contrast word2vec

Re: More HDFS and Shard Splitting

2014-11-20 Thread Joseph Obernberger
Just confirmed that you do need to create the core directory before doing the SHARDSPLIT (at least with HDFS) - otherwise it fails saying that it cannot find classes - like the cluster classes. Iv'e noticed that the disk usage on HDFS goes up when I do the split - for example, if I split a 100G sh

Re: [ANN] word2vec for Lucene

2014-11-20 Thread Koji Sekiguchi
Hi Paul, I cannot compare it to SemanticVectors as I don't know SemanticVectors. But word vectors that are produced by word2vec have interesting properties. Here is the description of the original word2vec web site: https://code.google.com/p/word2vec/#Interesting_properties_of_the_word_vectors I

Re: Include Solr score into a ranking algorithm

2014-11-20 Thread Ahmet Arslan
Hi Nicholas, you can use "sort by function" feature of solr. &sort=sum( mul(query(field:TfIdfQuery),x1), mul(x1,v2)) On Thursday, November 20, 2014 4:23 PM, Nicholas Ding wrote: Hi Mikhail, Thank you very much! I'm using eDisMax by default, I think I will need to change it to defType=func a

Re: Include Solr score into a ranking algorithm

2014-11-20 Thread Nicholas Ding
Hi Mikhail, Thank you very much! I'm using eDisMax by default, I think I will need to change it to defType=func and pass all the query parameters (fq mainly) to the sub query right? Nicholas Ding On Thu, Nov 20, 2014 at 5:22 AM, Mikhail Khludnev < mkhlud...@griddynamics.com> wrote: > Hello Nic

Does HttpSolrServer support Secure data publishing

2014-11-20 Thread Danesh Kuruppu
Hi all, I am using solr version 4.7.2 for indexing. I have some questions to be cleared. 1. Is solr server supports other transports methods like thrift etc 2. when we are using HttpSolrServer, how we secure the data publishing. use case is we need to restrict data publishing for specfic

Re: Handling intersection facets of many values

2014-11-20 Thread Michael Sokolov
If you're willing to write some Java you can do something more efficient by intersecting two terms enumerations: this works with constant memory for any number of values in two fields, basically like intersecting any two sorted lists, you leap frog between them. I have an example if you're int

Re: SOLR not starting after restart 2 node cloud setup

2014-11-20 Thread Doss
Dear Erick, Forgive my ignorance. Please find some of the details you required. *have you looked at the solr logs?* > Sorry I haven't defined the log4j.properties file, so I don't have solr logs. Since it requires tomcat restart I am planning to do it in next restart. But found the following

Re: [ANN] word2vec for Lucene

2014-11-20 Thread Paul Libbrecht
Hello Koji, how would you compare that to SemanticVectors? paul On 20 nov. 2014, at 10:10, Koji Sekiguchi wrote: > Hello, > > It's my pleasure to share that I have an interesting tool "word2vec for > Lucene" > available at https://github.com/kojisekig/word2vec-lucene . > > As you can imagin

Re: Include Solr score into a ranking algorithm

2014-11-20 Thread Mikhail Khludnev
Hello Nicholas! you can specify a function query as a main query where you can operate with DVs, then you can use regular tfidf score from arbitrary query as one of the arguments in the functional query see an example in http://wiki.apache.org/solr/FunctionQuery#query have a good research! On Thu

Re: IndexSearcher not being closed

2014-11-20 Thread Yonik Seeley
On Wed, Nov 19, 2014 at 8:37 AM, Priya Rodrigues wrote: > public void setContext( TransformContext context ) { > try { > IndexReader reader = qparser.getReq().getSearcher().getIndexReader(); > ->Refcount incremented You can get a searcher from the request as many times as you like... it

[ANN] word2vec for Lucene

2014-11-20 Thread Koji Sekiguchi
Hello, It's my pleasure to share that I have an interesting tool "word2vec for Lucene" available at https://github.com/kojisekig/word2vec-lucene . As you can imagine, you can use "word2vec for Lucene" to extract word vectors from Lucene index. Thank you, Koji -- http://soleami.com/blog/compar

Re: Index complex JSON data in SOLR

2014-11-20 Thread Renaud Delbru
Hi David, you might want to look at SIREn 1.4 [1], a plugin for Lucene/Solr, that includes a update handler [2] which mimics elasticsearch index api. You can push JSON documents to the API and it will dynamically flatten and index the JSON documents into a set of fields (similar to Elasticsea

Re: Handling intersection facets of many values

2014-11-20 Thread Toke Eskildsen
On Wed, 2014-11-19 at 23:53 +0100, Peter Sturge wrote: > Yes, the 'lots-of-booleans' thing is a bit prohibitive as it won't > realistically scale to large value sets. "large" is extremely relative in Solr Land, but I would be weary of going beyond 10K. > 127.0.0.1:8983/solr/net/select?q=*:*&fl=de