Re: BM25 model for solr 4?

2012-11-14 Thread Сергей Бирюков
There is good book http://nlp.stanford.edu/IR-book/ See chapter http://nlp.stanford.edu/IR-book/html/htmledition/okapi-bm25-a-non-binary-model-1.html 15.11.2012 06:16, Floyd Wu wrote: Hi there, Does anybody can kindly tell me how to setup solr to use BM25? By the way, are there any experime

Re: Solr 4.0 Spatial Search schema.xml and data-config.xml

2012-11-14 Thread David Smiley (@MITRE.org)
Tim, Combine them in "lat,lon" format using ScriptUpdateRequestProcessor using JavaScript. I'm doing this already in fact. See a template of an example that comes with Solr in update-script.js referenced by solrconfig.xml. I'd paste it right here if I had it but I have the excerpt for it on an

Re: BM25 model for solr 4?

2012-11-14 Thread David Smiley (@MITRE.org)
See http://wiki.apache.org/solr/SchemaXml#Similarity class="solr.BM25SimilarityFactory" The factories for these have javadocs that document the parameters: http://lucene.apache.org/solr/4_0_0/solr-core/org/apache/solr/search/similarities/package-summary.html I don't know about comparisons betwee

Re: Solr Indexing MAX FILE LIMIT

2012-11-14 Thread mitra
Thank you eric I didnt know that we could write a Java class for it , can you provide me with some info on how to Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Indexing-MAX-FILE-LIMIT-tp4019952p4020407.html Sent from the Solr - User mailing list archive at N

Re: Faceting Question

2012-11-14 Thread Jamie Johnson
Sorry some more info. I have a field to store source and another for date. I currently use faceting to get a temporal distribution across all sources. What is the best way to get a temporal distribution per source? Is the only thing I can do to execute 1 query for the list of sources and then an

Re: Does ICUFoldingFilterFactory make CJKWidthFilterFactory unnecessary?

2012-11-14 Thread Robert Muir
Yes, its a subset On Nov 14, 2012 1:18 PM, "Shawn Heisey" wrote: > I am using ICUFoldingFilterFactory in my Solr schema. Now I am looking at > adding CJKBigramFilterFactory, and I've noticed that it often goes with > CJKWidthFilterFactory. Here are the relevant Javadocs for my question: > > htt

BM25 model for solr 4?

2012-11-14 Thread Floyd Wu
Hi there, Does anybody can kindly tell me how to setup solr to use BM25? By the way, are there any experiment or research shows BM25 and classical VSM model comparison in recall/precision rate? Thanks in advanced.

Re: consistency in SolrCloud replication

2012-11-14 Thread Mark Miller
It's included as soon as it has been indexed - though a request won't return until it's affected all replicas. Low latency eventual consistency. - Mark On Nov 14, 2012, at 5:47 PM, Bill Au wrote: > Will a newly indexed document included in search result in the shard leader > as soon as it has

Re: Solr 4.0 - distributed updates without zookeeper?

2012-11-14 Thread Peter Wolanin
So, from looking at the code and talking to some of the Lucid guys today, it seems like there is no good way (currently) to control the shard leader selection, or even to "fail back" if the preferred leader server comes back up. We currently let indexing fail if the one master goes down, but addin

Re: Solr defining Schema structure trouble.

2012-11-14 Thread Jack Krupansky
You can break your books into individual pages, each a separate Solr "document", with the full page text as one tokenized text field value. Solr (Lucene) will take care of indexing the individual terms on each page. Then when you query on terms, Solr will find all pages that have the specified

Re: Nested Join Queries

2012-11-14 Thread Gerald Blanck
Mikhail- Let me know how to contribute a test case and I will put it on my to do list. When your many-to-many BlockJoin solution matures I would love to see it. Thanks. -Gerald On Tue, Nov 13, 2012 at 11:52 PM, Mikhail Khludnev < mkhlud...@griddynamics.com> wrote: > Gerald, > Nice to hear the

Solr 4.0 Spatial Search schema.xml and data-config.xml

2012-11-14 Thread dm_tim
Howdy, I now want to try my hand a spatial search. It looks fairly easy but I'm a bit puzzled about how to set up my schema.xml file. I know that my field must use the LatLon type but the columns of the database where I'll be pulling my data for indexing have separate lat and lon columns (both dou

Re: Internal Vs. External ZooKeeper

2012-11-14 Thread Anirudha Jadhav
Thanks mark ! On Sun, Nov 11, 2012 at 5:46 PM, Mark Miller wrote: > When SolrCloud is in a steady state (eg the number of nodes in the cluster > is not changing and config is not changing), Solr does not really talk to > ZooKeeper other than really light stuff like a heartbeat and maintaining a

Does ICUFoldingFilterFactory make CJKWidthFilterFactory unnecessary?

2012-11-14 Thread Shawn Heisey
I am using ICUFoldingFilterFactory in my Solr schema. Now I am looking at adding CJKBigramFilterFactory, and I've noticed that it often goes with CJKWidthFilterFactory. Here are the relevant Javadocs for my question: http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analy

Re: Searching in multiple cores via SolrJ

2012-11-14 Thread iwo
Hi, I use solrJ for cross core search and it is work correctly and fast. At First, you can make attention on schema definition, you should try to use as much as possible fields with the same name. For example all my scheme have a subset of common fields like title, summary, date, geo, image, ecc

Re: SolrCloudServer and SolrServerException No live SolrServers available

2012-11-14 Thread iwo
Hi, With same configuration, same core, same data, but Solr 4.0 release my project and junit test case works correctly by SolrCloudServer. I'm working with Lucid Works Ent. that don't use last built solr version, we asked to Lucid to upgrade solr. Thanks - Complicare è facile, semplificare

Re: Searching in multiple cores via SolrJ

2012-11-14 Thread Carlos Alexandro Becker
thanks anyway, Shawn. On Wed, Nov 14, 2012 at 5:24 PM, Carlos Alexandro Becker wrote: > hmm... the less-horrible way I could think (if solr doesn't support it by > default), is to create another core that "mix" the informations from other > cores, and then, search in it. > > But, well, it would

Re: Searching in multiple cores via SolrJ

2012-11-14 Thread Carlos Alexandro Becker
hmm... the less-horrible way I could think (if solr doesn't support it by default), is to create another core that "mix" the informations from other cores, and then, search in it. But, well, it would be ugly. On Wed, Nov 14, 2012 at 5:14 PM, Shawn Heisey wrote: > On 11/14/2012 10:48 AM, Carlos

Re: Error loading class solr.CJKBigramFilterFactory

2012-11-14 Thread Robert Muir
I'm sure. I added it to 3.6 ;) You must have something funky with your tomcat configuration, like an exploded war with different versions of jars or some other form of jar hell. On Wed, Nov 14, 2012 at 9:32 AM, Frederico Azeiteiro wrote: > Are you sure about that? > > We have it working on: > >

Re: Searching in multiple cores via SolrJ

2012-11-14 Thread Shawn Heisey
On 11/14/2012 10:48 AM, Carlos Alexandro Becker wrote: Hm, and in the case of my cores have different schemes? You might have to do all the heavy lifting yourself, after using SolrJ to retrieve the results. I will say that I have no idea -- there may be ways you can avoid doing that. I hope

Re: Searching in multiple cores via SolrJ

2012-11-14 Thread Carlos Alexandro Becker
Hm, and in the case of my cores have different schemes? Thanks in advance. On Wed, Nov 14, 2012 at 3:35 PM, Shawn Heisey wrote: > On 11/14/2012 10:19 AM, Carlos Alexandro Becker wrote: > >> What's the best way to search in multiple cores and merge the results >> using >> solrj? >> > > Your bes

Re: Searching in multiple cores via SolrJ

2012-11-14 Thread Shawn Heisey
On 11/14/2012 10:19 AM, Carlos Alexandro Becker wrote: What's the best way to search in multiple cores and merge the results using solrj? Your best bet really is to have Solr do this for you with distributed search. You can add the shards parameter to your queries easily with SolrJ, or you c

Re: Has anyone HunspellStemFilterFactory working?

2012-11-14 Thread Rob Koeling
Thanks for your reply, Sergey! Well, I was a bit puzzled. I tried adding a line to set the character set before, but then it complained about that as well. I installed the Russian dictionary and Solr was happy to load that. I noticed that the character-set was only set in the affix file for Russia

RE: Error loading class solr.CJKBigramFilterFactory

2012-11-14 Thread Frederico Azeiteiro
Are you sure about that? We have it working on: Solr Specification Version: 3.5.0.2011.11.22.14.54.38 Solr Implementation Version: 3.5.0 1204988 - simon - 2011-11-22 14:54:38 Lucene Specification Version: 3.5.0 Lucene Implementation Version: 3.5.0 1204988 - simon - 2011-11-22 14:46:51 Current Tim

Re: Run multiple instances of solr using single data directory

2012-11-14 Thread Rohit Harchandani
ok. but what are the problems when brining up multiple instances reading from the same data directory? also how to re-open the searchers without restarting solr? Thanks, Rohit On Tue, Nov 13, 2012 at 11:20 PM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > Hi, > > If you have high query

Re: How to change the Solr 4.0 index format?

2012-11-14 Thread tomw
On Mi, 2012-11-14 at 18:50 +0200, Artem Lokotosh wrote: > See https://issues.apache.org/jira/browse/MAHOUT-1112 > Seems mahout doesn't yet support lucene 4.0 > That indeed seems to be the reason. Running the test with solr 3.6.1 works fine. thanks, --tomw

Re: How to change the Solr 4.0 index format?

2012-11-14 Thread Artem Lokotosh
See https://issues.apache.org/jira/browse/MAHOUT-1112 Seems mahout doesn't yet support lucene 4.0 On Wed, Nov 14, 2012 at 6:38 PM, Jack Krupansky wrote: > Check the dates for the Solr/Lucene jars - they might be an early snapshot > before the index format stabilized. > > Or, maybe that Mahout sub

Re: How to change the Solr 4.0 index format?

2012-11-14 Thread Jack Krupansky
Check the dates for the Solr/Lucene jars - they might be an early snapshot before the index format stabilized. Or, maybe that Mahout sub-project had a copy of some old Lucene data. Keeping old Lucene data around as opposed to reindexing is a rather bad idea. -- Jack Krupansky -Original

Re: Error loading class solr.CJKBigramFilterFactory

2012-11-14 Thread Robert Muir
On Wed, Nov 14, 2012 at 8:12 AM, Frederico Azeiteiro wrote: > Fo make some further testing I installed SOLR 3.5.0 using default Jetty > server. > > When tried to start SOLR using the same schema I get: > > > > SEVERE: org.apache.solr.common.SolrException: Error loading class > 'solr.CJKBigramFilte

Re: How to change the Solr 4.0 index format?

2012-11-14 Thread tomw
On Mi, 2012-11-14 at 17:57 +0200, Artem Lokotosh wrote: > > Does it mean that Solr is creating the index in some kind of > > old format? Is it possible to change the format? > > Try this > http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/index/IndexUpgrader.html > I'm wondering why a n

Error loading class solr.CJKBigramFilterFactory

2012-11-14 Thread Frederico Azeiteiro
Hi, I've been testing some CJK tokenizers and I manage to get acceptable results using:

Re: How to change the Solr 4.0 index format?

2012-11-14 Thread Artem Lokotosh
> Does it mean that Solr is creating the index in some kind of > old format? Is it possible to change the format? Try this http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/index/IndexUpgrader.html On Wed, Nov 14, 2012 at 5:42 PM, tomw wrote: > Hi folks, > > I was trying to use an index

How to change the Solr 4.0 index format?

2012-11-14 Thread tomw
Hi folks, I was trying to use an index created by Solr 4.0 by mahout. However, creating the vectors like: bin/mahout lucene.vector -d ~/apache-solr-4.0.0/example/solr/data/index --output /tmp/mahout/vectors --field text --idField id --dictOut /tmp/mahout/dict.txt --norm 2 fails with an error:

SolrCloud: Shard resize

2012-11-14 Thread ku3ia
Hi all! My index is dynamically updated. This means, that every day I have new data, and every day I remove unused documents from it. Approximately, I know number of documents, which I'm indexing per day. Today I had tested a situation. Simple imagine, there is an one collection and two shards wi

Re: Removing Shards from Zookeeper - no servers hosting shard

2012-11-14 Thread Mark Miller
Missed the list in my last reply: This used to work properly - I'm guess that the zk layout refactoring right before 4.0 broke it. We likely need a JIRA issue, a fix, and a test. Mark On Nov 14, 2012, at 6:43 AM, Gilles Comeau wrote: > Hi all, > > I just wanted to make the simplest repro of

Re: Neary text search system with solr.

2012-11-14 Thread Ahmet Arslan
Kobayashi-san I suspect you are hitting this: "The NOT operator excludes documents that contain the term after NOT. This is equivalent to a difference using sets. The symbol ! can be used in place of the word NOT." If you appends &debugQuery=on to your search URL, you can see parsed query et

RE: Apache Solr Quiz

2012-11-14 Thread 菅沼 嘉一
I am almost beginner for Solr. So this quiz site is very helpful for education training. Could you let me add a question to this site? Q. The documents are ranked by the "Score" which is calculated by the "Nearness" of document and query. This score is tend to increase depending on the length of

Neary text search system with solr.

2012-11-14 Thread alu
Hi. I maiking "Neary text search system" with solr. Example: input text : Hello World! query: Hello World! response: Hello World! this point went well input text : Hello World! query: World! Hello response: Hello World! This does not work. I need switched back and forth text. How to? -- V

Re: Error with SolrCloud

2012-11-14 Thread Carlos Alexandro Becker
Hi, Tomás help me, and we found the issues. Basically, I had the solrconfig.xml, schema.xml and etc inside my war, and looks like zookeeper does't look for these files in classpath. That was pretty easy, just copied the files to the proper location inside solr folder, so I got something like this

RE: Removing Shards from Zookeeper - no servers hosting shard

2012-11-14 Thread Gilles Comeau
Hi all, I just wanted to make the simplest repro of this issue, which now I am thinking might be related to the decision made in: https://issues.apache.org/jira/browse/SOLR-3080 ? And this is the expected behaviour? 1. Download SOLR 4 production and extract. 2. Replace solr.xml in

Re: Has anyone HunspellStemFilterFactory working?

2012-11-14 Thread Сергей Бирюков
Rob, as regards your "problem" 'SET charset' 'charset' word must be replaced with a name-of-character-set (i.e. encoding) For exampe, you can write 'SET UTF-8' BUT... Be careful! At least for russian language morthology HunspellStemFilterFactory has bug(s) in its algorythm. Simple co

Re: Multivalued or not

2012-11-14 Thread Upayavira
I'm pretty sure that Solr only checks whether a field is multivalued at the point at which it receives the second value for a specific field. In your entry below, you only provided one value, so Solr wouldn't complain. Add another line to your , and I bet you it will moan at you. Upayavira On W

RE: Multivalued or not

2012-11-14 Thread Peter Kirk
Hi - and thanks to you and Erik. I have changed to schema version 1.5. /Peter -Original Message- From: Jeevanandam Madanagopal [mailto:je...@myjeeva.com] Sent: 14. november 2012 10:38 To: solr-user@lucene.apache.org Subject: Re: Multivalued or not Okay, I believe you're using Solr 3.6,

RE: Unable to run two multicore Solr instances under Tomcat

2012-11-14 Thread Adam Neal
Just to wrap up this one. Previously all the lib jars were located in the war file on our setup, this was mainly to ease deployment as it's just a single file. Moving the lib directory external to the war seems to have fixed the issue. Thanks for the pointer Erick. -Original Message-

Re: Admin Permissions

2012-11-14 Thread Juan Carlos Serrano
Basic http authentication can use to filter the accesses to different urlas you want, so you can allow access to the Query, Analysis, etc and Admin ban 2012/11/13 Erick Erickson > Slap them firmly on the wrist if they do? > > The Solr admin is really designed with trusted users in mind. There a

Re: Multivalued or not

2012-11-14 Thread Jeevanandam Madanagopal
Okay, I believe you're using Solr 3.6, here you can use schema version 1.5 However, you're currently using version 1.0, it safer to update your schema version to 1.1 then multiValued is false by default. FYI. Schema version info (from schema.xml):

RE: Multivalued or not

2012-11-14 Thread Peter Kirk
Should be 1.1 I see. -Original Message- From: Peter Kirk [mailto:p...@alpha-solutions.dk] Sent: 14. november 2012 10:24 To: solr-user@lucene.apache.org Subject: RE: Multivalued or not Hi, it says version 1.0 /Peter -Original Message- From: Erik Hatcher [mailto:erik.hatc...@g

RE: Multivalued or not

2012-11-14 Thread Peter Kirk
Hi, it says version 1.0 /Peter -Original Message- From: Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent: 14. november 2012 10:22 To: solr-user@lucene.apache.org Subject: Re: Multivalued or not But what is your schema version? See the top of schema.xml. On Nov 14, 2012, at 4:17,

Re: Multivalued or not

2012-11-14 Thread Erik Hatcher
But what is your schema version? See the top of schema.xml. On Nov 14, 2012, at 4:17, Peter Kirk wrote: > Hi > > Thanks for the reply. It is strange, because when I index to a field defined > like: > > name="*_string" > stored="true" >

RE: Multivalued or not

2012-11-14 Thread Peter Kirk
Hi Thanks for the reply. It is strange, because when I index to a field defined like: Then the results I receive are like: Woodland Which seems to indicate a multivalued field. If I change the field definition, so I explicitly say multivalued is false: Then the result is li

Re: Multivalued or not

2012-11-14 Thread Jeevanandam Madanagopal
Hello Peter - In Solr 3.6 multiValued is false by default. Since Schema version 1.1 onwards multiValued attribute value is false by default (, , ) -Jeeva Blog: http://www.myjeeva.com On Nov 14, 2012, at 2:04 PM, Peter Kirk wrote: > Hi > > In Solr 3.6, is multivalued for fields, default tr

Multivalued or not

2012-11-14 Thread Peter Kirk
Hi In Solr 3.6, is multivalued for fields, default true or false? It appears that it is default false for normal fields, and default true for dynamic fields - is that correct? Thanks, Peter