Re: shingles work in analyzer but not real data

2010-09-03 Thread Dennis Gearon
Thank you mucho much, Lance. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Fri, 9/3/10, Lance Norskog wrote: > From: Lance Norskog > Subject: Re: shingles work i

Re: shingles work in analyzer but not real data

2010-09-03 Thread Lance Norskog
http://en.wikipedia.org/wiki/W-shingling On Fri, Sep 3, 2010 at 6:19 AM, Steven A Rowe wrote: > Hi Dennis, > > I took a stab at answering this question in the following java-user mailing > list post: > > http://www.lucidimagination.com/search/document/6cb7b54cce6872b3/lucene_indexes > > Steve >

Re: solr user

2010-09-03 Thread Lance Norskog
Naming fields something_t but declaring them "string" will either not work, or cause confusion. On Thu, Sep 2, 2010 at 6:49 AM, kenf_nc wrote: > > You are querying for 'branch' and trying to place it in 'skill'. > > Also, you have Name and Column backwards, it should be: > > > > > > -- > View

Re: Solr crawls during replication

2010-09-03 Thread Shawn Heisey
On 9/3/2010 12:37 PM, Jonathan Rochkind wrote: Is the OS disk cache something you configure, or something the OS just does automatically based on available free RAM? Or does it depend on the exact OS? Thinking about the OS disk cache is new to me. Thanks for any tips. Depends on what you w

Re: solr

2010-09-03 Thread ankita shinde
I didn't find "name" in my solrconfig.xml. Do I need to include this tag? On Sat, Sep 4, 2010 at 3:04 AM, Papiya Misra wrote: > What is the query that you are using ? Try something like q=city:Chicago . > > Look at the solrconfig file and you will see > name . This is the reason that > unless yo

Re: solr

2010-09-03 Thread Papiya Misra
What is the query that you are using ? Try something like q=city:Chicago . Look at the solrconfig file and you will see name . This is the reason that unless you specify the search field in the query, solr will always search the field name. On 09/03/2010 04:52 PM, ankita shinde wrote: hello, I

RE: Do commits block updates in SOLR 1.4?

2010-09-03 Thread Robert Petersen
Thanks guys! I will be quite happy to remove the unnecessary complexity from our code. -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Friday, September 03, 2010 10:28 AM To: solr-user@lucene.apache.org Subject: Re: Do commits block updates in SOLR 1.4? Solr h

Boost, weight, proximity, ranking which one?

2010-09-03 Thread javaxmlsoapdev
I am using solr 1.4 version. I have a requirement where need to show up all documents first which matched most words from the free text search string. e.g. If user was searching for two words with no quotes "connectivity breakup" my search results should display all documents where both words mat

solr

2010-09-03 Thread ankita shinde
hello, I have done all the suggested changes. My table name is 'info' having columns id,name,city and skill. I am able to index them all successfully. But I am able to search the data only using name and not other column names. Where did I go wrong? *My data-config.xml file is as below:*

Re: Solr crawls during replication

2010-09-03 Thread Mark
On 9/3/10 11:37 AM, Jonathan Rochkind wrote: Is the OS disk cache something you configure, or something the OS just does automatically based on available free RAM? Or does it depend on the exact OS? Thinking about the OS disk cache is new to me. Thanks for any tips. _

RE: Solr crawls during replication

2010-09-03 Thread Jonathan Rochkind
Is the OS disk cache something you configure, or something the OS just does automatically based on available free RAM? Or does it depend on the exact OS? Thinking about the OS disk cache is new to me. Thanks for any tips. From: Shawn Heisey [s...@elyogr

Re: Hardware Specs Question

2010-09-03 Thread Shawn Heisey
On 9/3/2010 3:39 AM, Toke Eskildsen wrote: I'll have to extrapolate a lot here (also known as guessing). You don't mention what kind of harddrives you're using, so let's say 15.000 RPM to err on the high-end side. Compared to the 2 drives @ 15.000 RPM in RAID 1 we've experimented with, the diffe

Re: In Need of Direction; Phrase-Context Tracking / Injection (Child Indexes) / Dismissal

2010-09-03 Thread Scott Gonyea
I've been considering the use of Hadoop, since that's what Nutch uses. Unless I piggy-back onto Nutch's MR job, when creating a Solr index, I'm wondering if it's overkill. I can see ways of working it into a MapReduce workflow, but it would involve dumping the database onto HDFS beforehand. I'm st

Re: Solr crawls during replication

2010-09-03 Thread Shawn Heisey
On 9/2/2010 9:31 AM, Mark wrote: Thanks for the suggestions. Our slaves have 12G with 10G dedicated to the JVM.. too much? Are the rysnc snappuller featurs still available in 1.4.1? I may try that to see if helps. Configuration of the switches may also be possible. Also, would you mind expl

Re: Throttling replication

2010-09-03 Thread Brandon Evans
On 9/2/10 3:20 PM, Koji Sekiguchi wrote: (10/09/03 5:42), Brandon Evans wrote: On 9/2/10 11:16 AM, Mark wrote: I am using the built in replication. Can you send me a link to the patch so I can give it a try? Thanks This patch looks great! Can you open a jira issue and contribute the patc

Solr + Katta ... benefits?

2010-09-03 Thread thiseye
I'm investigating using Lucene for a project to index a massive HBase database. I was looking at using Katta to distribute the index because people have said that becomes a limitation with simply using Lucene as the index grows. Then I came across Solr which seems like it would also help this proj

Re: Do commits block updates in SOLR 1.4?

2010-09-03 Thread Mark Miller
Solr handles all of this concurrency for you - it's actually even a little too aggressive about that these days, as Lucene has changed a lot - but yes - you can add while committing and commit while adding - Solr will block itself as needed. - Mark On 9/3/10 1:27 PM, Robert Petersen wrote: > So y

RE: Do commits block updates in SOLR 1.4?

2010-09-03 Thread Robert Petersen
So you are saying we definitely do not need to pause ADD activity on other threads while we send the COMMIT? And the same goes with AUTOCOMMIT right? We are using SOLR 1.4 now. We were on 1.3 previously. We pretty much just assumed pausing ADDs during COMMITs was required by SOLR when we desi

Re: Patch to pass a file to the first and new index searchers

2010-09-03 Thread Papiya Misra
Ok - found it - https://issues.apache.org/jira/browse/SOLR-784 . On 09/03/2010 11:51 AM, Papiya Misra wrote: I do not want to make the solrconfig.xml huge. I think I saw a patch a couple of weeks back that allowed passing a csv file as a parameter. Can anyone help ? Thanks Papiya Pink OTC M

Re: false matches with ReversedWildcardFilterFactory

2010-09-03 Thread Robert Muir
On Fri, Sep 3, 2010 at 11:55 AM, Yonik Seeley wrote: > > Off the top of my head, I'm not sure of an easy way to prevent this. > we could fix this in trunk easily for these queries, with intersection/subtraction (e.g. minus "\u0001.*" from the DFA) -- Robert Muir rcm...@gmail.com

Re: false matches with ReversedWildcardFilterFactory

2010-09-03 Thread Yonik Seeley
On Thu, Sep 2, 2010 at 1:10 PM, Landon Kuhn wrote: > Hello, I am using the ReversedWildcardFilterFactory, and I am > wondering if there is a way to prevent false matches when a query > token matches the reversed indexed token. For instance, the query > *zemog* matches documents that contain Gomez.

Re: Hardware Specs Question

2010-09-03 Thread Dennis Gearon
I wouldn't have thought that CPU was a big deal with the speed/cores of CPU's continuously growing according to Moore's law and the change in Disk Speed barely changine 50% in 15 years. Must have a lot to do with caching. What size indexes are you working with? Are you saying you can get the who

Patch to pass a file to the first and new index searchers

2010-09-03 Thread Papiya Misra
I do not want to make the solrconfig.xml huge. I think I saw a patch a couple of weeks back that allowed passing a csv file as a parameter. Can anyone help ? Thanks Papiya Pink OTC Markets Inc. provides the leading inter-dealer quotation and trading system in the over-the-counter (OTC) securit

Re: how/why would I use LiteralValueSource and can I create a custom string function?

2010-09-03 Thread Gerald
I dont really have any specific use case in mind; I was just wondering what I could (or couldn't) do with custom functions possible reasons for allowing that type of syntax include: 1. in general, to simplify queries, and make them more readable, by eliminating the need for the _val_ hack (which

Re: Hardware Specs Question

2010-09-03 Thread scott chu
well balanced system = Agree. Here we'll start a performance & load test this month. I've defined a test criteria of 'qps', 'RTpQ' & worse case according to our use case & past experience. Our goal is pursuing this criteria & adjust hardware & system configuration to find a well

Re: Auto Suggest

2010-09-03 Thread dan sutton
I set this up a few years ago with something like the following:

RE: how to deal with virtual collection in solr?

2010-09-03 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much for your help, Jan Høydahl. Have a great weekend! Xiaohui -Original Message- From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com] Sent: Friday, September 03, 2010 3:46 AM To: solr-user@lucene.apache.org Subject: Re: how to deal with virtual collection in solr? You

Re: Auto Suggest

2010-09-03 Thread Luke Tebbs
What about if you do something like this? - facet=true&facet.mincount=1&q=apple&facet.limit=10&facet.prefix=mou&facet.field=term_suggest&qt=basic&wt=javabin&rows=0&version=1 Jason Rutherglen wrote: To clarify, the query analyzer returns that. Variations such as "apple mou" also do not return

Re: Localsolr with Dismax => workaround using spatial solr

2010-09-03 Thread Luke Tebbs
I finally managed to get spatial searching working in combination with dismax so I'm sending this should anyone else have the same problem. I gave up using localsolr in the end - one of the resultsets of the two it returned was correct (dismax+spatial) but I don't trust this enough to depend u

Re: Auto Suggest

2010-09-03 Thread Jason Rutherglen
To clarify, the query analyzer returns that. Variations such as "apple mou" also do not return anything. Maybe Jay can comment and then we can amend the article? On Fri, Sep 3, 2010 at 6:12 AM, Jason Rutherglen wrote: > Analysis returns "app mou". > > On Thu, Sep 2, 2010 at 6:12 PM, Lance Norsk

Re: Index time boosting

2010-09-03 Thread phoey
thanks for replying erick, We are currently using dismax but for this particular client we have coupled their implementation to standard parser and will be difficult to switch, although i might just have to bite the bullet for this. "which you can do without dismax BTW, although you have to int

Re: Does SolrNet support indexing of Database tables and XML files

2010-09-03 Thread kenf_nc
Alok, I noticed you also posted to the SolrNet forum, and that's a better place for this question. But basically, SolrNet is a wrapper around Solr functionality. It lets you build your Solr interactions (Queries, Stats, Facets, etc) and Inserts/Deletes using .Net objects. The reading of a data so

Re: spellcheck distance measure algorithms error ?

2010-09-03 Thread Xavier Schepler
On 03/09/2010 15:31, Grant Ingersoll wrote: On Sep 3, 2010, at 9:14 AM, Xavier Schepler wrote: On 03/09/2010 14:47, Grant Ingersoll wrote: On Sep 3, 2010, at 6:02 AM, Xavier Schepler wrote: no, jopsin isn't in the index. I tryed this with other words and I had the same error

RE: Does SolrNet support indexing of Database tables and XML files

2010-09-03 Thread Michael Griffiths
First of all, I suggest you ask in the SolrNET group: http://groups.google.com/group/solrnet Second, Solr support both database tables and XML files through the Data Import Handler (DIH). You may wish to configure indexing in Solr, then query via SolrNET. -Original Message- From: alokd

Re: Index with ItalianStemmer

2010-09-03 Thread Robert Muir
On Fri, Sep 3, 2010 at 8:04 AM, Tommaso Teofili wrote: > Does anyone know what could be the root cause or if I am missing something? > Thanks in advance for any help, > Tommaso > I didn't see a definition of your 'query' analyzer, only 'index'. Can you ensure you specify Italian Stemmer at 'query

Re: spellcheck distance measure algorithms error ?

2010-09-03 Thread Grant Ingersoll
On Sep 3, 2010, at 9:14 AM, Xavier Schepler wrote: > On 03/09/2010 14:47, Grant Ingersoll wrote: >> On Sep 3, 2010, at 6:02 AM, Xavier Schepler wrote: > no, jopsin isn't in the index. > I tryed this with other words and I had the same error. > Thx for your reply. And what happens if you drop th

Does SolrNet support indexing of Database tables and XML files

2010-09-03 Thread alokdayal
Dear All, I've developed an application in C#, SolrNet which is indexing and searching text file. Now I am trying to change the application and trying to index and search database table as well as XML file but its not working. My question is that whether SolrNet support indexing and search

RE: shingles work in analyzer but not real data

2010-09-03 Thread Steven A Rowe
Hi Dennis, I took a stab at answering this question in the following java-user mailing list post: http://www.lucidimagination.com/search/document/6cb7b54cce6872b3/lucene_indexes Steve > -Original Message- > From: Dennis Gearon [mailto:gear...@sbcglobal.net] > Sent: Friday, September 03

Re: spellcheck distance measure algorithms error ?

2010-09-03 Thread Xavier Schepler
On 03/09/2010 14:47, Grant Ingersoll wrote: On Sep 3, 2010, at 6:02 AM, Xavier Schepler wrote: Hi, When I take the two letters from the middle of a word and put the first in place of the second and the second in place of the first, ex : jospin => jopsin, I don't get any suggestion from

Re: Auto Suggest

2010-09-03 Thread Jason Rutherglen
Analysis returns "app mou". On Thu, Sep 2, 2010 at 6:12 PM, Lance Norskog wrote: > What does analysis.jsp show? > > On Thu, Sep 2, 2010 at 5:53 AM, Jason Rutherglen > wrote: >> I'm having a different issue with the EdgeNGram technique described >> here: >> http://www.lucidimagination.com/blog/2

Re: how/why would I use LiteralValueSource and can I create a custom string function?

2010-09-03 Thread Grant Ingersoll
On Sep 2, 2010, at 7:40 PM, Gerald wrote: > > Thanks Grant > > Am looking forward to the day when I can create a SOLR URL that looks > something like this: > > http://mysolrserver:8080/solr/select?q=*:* AND > mycustomstrfunction(mysolrstrfield):'somestringvalue' AND > mycustomintfunction(mysol

Re: spellcheck distance measure algorithms error ?

2010-09-03 Thread Grant Ingersoll
On Sep 3, 2010, at 6:02 AM, Xavier Schepler wrote: > Hi, > > When I take the two letters from the middle of a word and put the first in > place of the second and the second in place of the first, ex : jospin => > jopsin, I don't get any suggestion from the spellchecker component. > > I tryed

Re: Purpose of SolrDocument.java

2010-09-03 Thread Peter Karich
> aaah okay. > > so its SolrDocument in "normal" search never been used ? its only for other > solr-plugins ? > SolrDocument is under org.apache.solr.common which is for the solr-solj.jar and not available for the solr-core.jar see e.g.: http://lucene.apache.org/solr/api/org/apache/solr/commo

Index with ItalianStemmer

2010-09-03 Thread Tommaso Teofili
Hi all, I am experiencing a strange behavior while indexing italian text (an indexed not stored text field) when stemming with italian language: generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseC

Re: Index time boosting

2010-09-03 Thread Erick Erickson
Have you tried running the queries through with &debugQuery=true? It may well be that, for certain documents, the lower-boosted fields are still overwhelming the contribution to scoring from the higher-boosted fields for the documents in question. The problem is that index-time boosting is fairly

Re: SolrJ and Multi Core Set up

2010-09-03 Thread Shaun Campbell
Thanks Chantal I hadn't spotted that that's a big help. Thank you. Shaun On 3 September 2010 12:31, Chantal Ackermann < chantal.ackerm...@btelligent.de> wrote: > Hi Shaun, > > you create the SolrServer using multicore by just adding the core to the > URL. You don't need to add anything with Solr

Re: stream.url

2010-09-03 Thread satya swaroop
Hi all, I am unable to index the files of remote system that contains escaped characters in their file names i think there is a problem in solr for indexing the files of escaped characters in remote system... Has anybody tried to index the files in remote system that contain the escaped

Re: SolrJ and Multi Core Set up

2010-09-03 Thread Chantal Ackermann
Hi Shaun, you create the SolrServer using multicore by just adding the core to the URL. You don't need to add anything with SolrQuery. URL url = new URL(new URL(solrBaseUrl), coreName); CommonsHttpSolrServer server = new CommonsHttpSolrServer(url); Concerning the "default" core thing - I wouldn'

Re: Hardware Specs Question

2010-09-03 Thread Toke Eskildsen
On Fri, 2010-09-03 at 11:07 +0200, Dennis Gearon wrote: > If you really want to see performance, try external DRAM disks. > Whew! 800X faster than a disk. As sexy as they are, the DRAM drives does not buy much more extra performance. At least not at the search stage. For searching, SSDs are not th

Re: Purpose of SolrDocument.java

2010-09-03 Thread stockii
aaah okay. so its SolrDocument in "normal" search never been used ? its only for other solr-plugins ? -- View this message in context: http://lucene.472066.n3.nabble.com/Purpose-of-SolrDocument-java-tp1408443p1411276.html Sent from the Solr - User mailing list archive at Nabble.com.

SolrJ and Multi Core Set up

2010-09-03 Thread Shaun Campbell
I'm writing a client using SolrJ and was wondering how to handle a multi core installation. We want to use the facility to rebuild the index on one of the cores at a scheduled time and then use the SWAP facility to switch the "live" core to the newly rebuilt core. I think I can do the SWAP with C

spellcheck distance measure algorithms error ?

2010-09-03 Thread Xavier Schepler
Hi, When I take the two letters from the middle of a word and put the first in place of the second and the second in place of the first, ex : jospin => jopsin, I don't get any suggestion from the spellchecker component. I tryed the default algorithm and the Jaro Winkler Distance, with a coef

Re: shingles work in analyzer but not real data

2010-09-03 Thread 朱炎詹
Look up pp.288 in "Solr 1.4 Enterprise Search Engine" book by Eric & David. Shingling is suitable for phrase query case based on token level, it's similar with n-gram. However, the latter one is based on term. We are currently using shingling in our index with shingle size = 3. Be careful, th

Re: Hardware Specs Question

2010-09-03 Thread Toke Eskildsen
On Fri, 2010-09-03 at 03:45 +0200, Shawn Heisey wrote: > On 9/2/2010 2:54 AM, Toke Eskildsen wrote: > > We've done a fair amount of experimentation in this area (1997-era SSDs > > vs. two 15.000 RPM harddisks in RAID 1 vs. two 10.000 RPM harddisks in > > RAID 0). The harddisk setups never stood a c

Re: shingles work in analyzer but not real data

2010-09-03 Thread Jeff Rose
I don't have any fancy links, but from the documentation shingles make pretty good sense. You typically tokenize an input string so that "the best apple pie" becomes "the" "best" "apple" "pie", so that each term can then be filtered to remove stop words, take off plurals and suffixes like "ing", e

Index time boosting

2010-09-03 Thread phoey
Hi there, Im having some issues with my relevancy of some results. I have 5 fields, with varying boost values and being copied into a copyfield "text" which is used to be searched on ... im sending each of these fields with the boost values (i_title is 20, i_authors is 10 ... i

Re: Hardware Specs Question

2010-09-03 Thread Dennis Gearon
If you really want to see performance, try external DRAM disks. Whew! 800X faster than a disk. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Thu, 9/2/10, Shawn Hei

Re: shingles work in analyzer but not real data

2010-09-03 Thread Dennis Gearon
Anyone got a definitive, authoritative link to the definition of a 'shingle' in search engine results/technology? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Fri

Re: shingles work in analyzer but not real data

2010-09-03 Thread Jeff Rose
Thanks Steven and Jonathan, we got it working by using a combination of quoting and the PositionFilterFactory, like is shown below. The documentation for the position filter doesn't make much sense without understanding more about how positioning of tokens is taken into account, but it appears to

Re: Auto Suggest

2010-09-03 Thread Jan Høydahl / Cominvent
Are you phrasing the query, like &q="app mou" ? I guess with edgeNgram you use KeywordTokenizer which stores phrases as single terms. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 2. sep. 2010, at 14.53, Jason Rutherglen w

Re: In Need of Direction; Phrase-Context Tracking / Injection (Child Indexes) / Dismissal

2010-09-03 Thread Jan Høydahl / Cominvent
Hi, This smells like a job for Hadoop and perhaps Mahout, unless your use cases are totally ad-hoc research. After Nutch has fetched the sites, kick off some MapReduce jobs for each case you wish to study: 1. Extract phrases/contexts 2. For each context, perform detection and whitelisting 3. In

Re: how to deal with virtual collection in solr?

2010-09-03 Thread Jan Høydahl / Cominvent
You did not supply your actual query. Try to add a &q=foobar parameter, also you don't need a & before shards since you have the ?. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 1. sep. 2010, at 20.14, Ma, Xiaohui (NIH/NLM/

Re: Purpose of SolrDocument.java

2010-09-03 Thread Peter Karich
Hi, you can use it via SolrJ: QueryResponse rsp = solrServer.query(query); SolrDocumentList docs = rsp.getResults(); for (SolrDocument doc : docs) { long id = (Long) doc.getFieldValue("id"); // create your higher level object here ... } SolrJ get the docs either from xml or binary stre