logic required for newbie

2010-07-27 Thread Jonty Rhods
Hi All, I am very new and learning solr. I have 10 column like following in table 1. id 2. name 3. user_id 4. location 5. country 6. landmark1 7. landmark2 8. landmark3 9. landmark4 10. landmark5 when user search for landmark then I want to return only one landmark which match. Rest of the lan

facet total score instead of total count

2010-07-27 Thread Bharat Jain
Hi, I have a requirement where I want to sum up the scores of the faceted fields. This will be decide the relevancy for us. Is there a way to do it on a facet field? Basically instead of giving the count of records for facet field I would like to have total sum of scores for those records. Any

Re: Russian stemmer

2010-07-27 Thread Dennis Gearon
I have studied some Russian. I kind of got the picture from the texts that all the exceptions had already been 'found', and were listed in the book. I do know that languages are living, changing organisms, but Russian has got to be more regular than English I would think, even WITH all six case

Re: DIH : SQL query (sub-entity) is executed although variable is not set (null or empty list)

2010-07-27 Thread Lance Norskog
Should this go into the trunk, or does it only solve problems unique to your use case? On Tue, Jul 27, 2010 at 5:49 AM, Chantal Ackermann wrote: > Hi Mitch, > > thanks for the code. Currently, I've got a different solution running > but it's always good to have examples. > >> > If realized >> > t

Re: Indexing Problem: Where's my data?

2010-07-27 Thread Lance Norskog
Solr respects case for field names. Database fields are supplied in lower-case, so it should be 'attribute_name' and 'string_value'. Also 'product_id', etc. It is easier if you carefully emulate every detail in the examples, for example lower-case names. On Tue, Jul 27, 2010 at 2:59 PM, kenf_nc

Re: slave index is bigger than master index

2010-07-27 Thread Lance Norskog
Ah! You have junk files piling up in the slave index directory. When this happens, you may have to remove data/index entirely. I'm not sure if Solr replication will handle that, or if you have to copy the whole index to reset it. You said the slaves time out- maybe the files are so large that the

Re: Solr 3.1 and ExtractingRequestHandler resulting in blank content

2010-07-27 Thread Lance Norskog
There are two different datasets that Solr (Lucene really) saves from a document: raw storage and the indexed terms. I don't think the ExtractingRequestHandler ever automatically stored the raw data; in fact Lucene works in Strings internally, not raw byte arrays (this is changing). It should be i

Re: Spellchecking and frequency

2010-07-27 Thread Erick Erickson
"Yonik's Law of Patches" reads: "A half-baked patch in Jira, with no documentation, no tests and no backwards compatibilty is better than no patch at all." It'd be perfectly appropriate, IMO, for you to post an outline of what your enhancements do over on the SOLR dev list and get a reaction from

Re: Tika, Solr running under Tomcat 6 on Debian

2010-07-27 Thread Lance Norskog
I would start over from the Solr 1.4.1 binary distribution and follow the instructions on the wiki: http://wiki.apache.org/solr/ExtractingRequestHandler (Java classpath stuff is notoriously difficult, especially when dynamically configured and loaded. I often cannot tell if Java cannot load the c

RE: How to 'filter' facet results

2010-07-27 Thread Jonathan Rochkind
> Is there a way to tell Solr to only return a specific set of facet values? I > feel like the facet query must be able to do this, but I'm not really > understanding the facet query. In my specific case, I'd like to only see > facet > values for the same values I pass in as query filters, i.e.

How to 'filter' facet results

2010-07-27 Thread David Thompson
Is there a way to tell Solr to only return a specific set of facet values? I feel like the facet query must be able to do this, but I'm not really understanding the facet query. In my specific case, I'd like to only see facet values for the same values I pass in as query filters, i.e. if I run

Re: Highlighting parameters wiki

2010-07-27 Thread Koji Sekiguchi
(10/07/27 23:16), Stephen Green wrote: The wiki entry for hl.highlightMultiTerm: http://wiki.apache.org/solr/HighlightingParameters#hl.highlightMultiTerm doesn't appear to be correct. It says: If the SpanScorer is also being used, enables highlighting for range/wildcard/fuzzy/prefix queries.

Re: Indexing Problem: Where's my data?

2010-07-27 Thread kenf_nc
for STRING_VALUE, I assume there is a property in the 'select *' results called string_value? if so I'm not sure why it wouldn't work. If not, then that's why, it doesn't have anything to put there. For ATTRIBUTE_NAME, is it possibly a case issue? you called it 'Attribute_Name' in your query, but

RE: Querying throws java.util.ArrayList.RangeCheck

2010-07-27 Thread Manepalli, Kalyan
Yonik, One more update on this. I used the filter query that was throwing error and used it to delete a subset of results. After that the queries started working correctly. Which indicates that the particular docId was present in the index somewhere, but lucene was not able to find it.

Re: Querying throws java.util.ArrayList.RangeCheck

2010-07-27 Thread Yonik Seeley
I haven't been able to reproduce anything... But if you guys are sure you're not running any custom code, then there's definitely seems to be a bug somewhere. Can anyone reproduce this in something you can share? -Yonik http://www.lucidimagination.com

Indexing Problem: Where's my data?

2010-07-27 Thread Michael Griffiths
Hi, (The first version of this was rejected for spam). I'm setting up a test instance of Solr, and keep running into the problem of having Solr not work the way I think it should work. Specifically, the data I want to go into the index isn't there after indexing. I'm extracting the data from M

min/max, StatsComponent, performance

2010-07-27 Thread Jonathan Rochkind
I thought I asked a variation of this before, but I don't see it on the list, apologies if this is a duplicate, but I have new questions. So I need to find the min and max value of a result set. Which can be several million documents. One way to do this is the StatsComponent. One problem is t

Re: Difficulties with Highlighting

2010-07-27 Thread Nathaniel Grove
Erik, You're right on both accounts. I'll upgrade and then check into whether our tokenizer is working properly. Thanks, Than Erik Hatcher wrote: Than - Looks like maybe your text_bo field type isn't analyzing how you'd like? Though that's just a hunch. I pasted the value of that field

Re: Querying throws java.util.ArrayList.RangeCheck

2010-07-27 Thread Jason Ronallo
I am getting a similar error with today's nightly build: HTTP Status 500 - Index: 54, Size: 24 java.lang.IndexOutOfBoundsException: Index: 54, Size: 24 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.lucene.index.FieldInfos.fieldIn

Re: Difficulties with Highlighting

2010-07-27 Thread Erik Hatcher
Than - Looks like maybe your text_bo field type isn't analyzing how you'd like? Though that's just a hunch. I pasted the value of that field returned in the link you provided into your analysis.jsp page and it chunked tokens by whitespace. Though I could be experiencing a copy/ paste/i

Re: SolrCore has a large number of SolrIndexSearchers retained in "infoRegistry"

2010-07-27 Thread skommuri
Thank you very much Hoss for the reply. I am using the embedded mode (SolrServer). I am not explicitly accessing SolrIndexSearcher. I am explicitly closing the SolrCore after the request has been processed. Although I did notice that I am using SolrQueryRequest object and is not explicitly getti

Re: SolrCore has a large number of SolrIndexSearchers retained in "infoRegistry"

2010-07-27 Thread Ken Krugler
On Jul 27, 2010, at 12:21pm, Chris Hostetter wrote: : : I was wondering if anyone has found any resolution to this email thread? As Grant asked in his reply when this thread was first started (December 2009)... It sounds like you are either using embedded mode or you have some custom c

Difficulties with Highlighting

2010-07-27 Thread Nathaniel Grove
I'm a relative beginner at SOLR, indexing and searching Unicode Tibetan texts. I am trying to use the highlighter but it just returns, empty elements, such as: What am I doing wrong? The query that generated that is: http://www.thlib.org:8080/thdl-solr/thdl-texts/select?inden

Re: help finding illegal chars in XML doc

2010-07-27 Thread Chris Hostetter
: Thanks for your reply. I could not find in the log files any mention to : that. By the way I only have _MM_DD.request.log files in my directory. : : Do I have to enable any specific log or level to catch those errors? if you are using that "java -jar start.jar" command for the example jet

Re: SolrCore has a large number of SolrIndexSearchers retained in "infoRegistry"

2010-07-27 Thread Chris Hostetter
: : I was wondering if anyone has found any resolution to this email thread? As Grant asked in his reply when this thread was first started (December 2009)... >> It sounds like you are either using embedded mode or you have some >> custom code. Are you sure you are releasing your resources co

Re: Timeout in distributed search

2010-07-27 Thread Chris Hostetter
: Is there anyway to have time out support in distributed search. I : searched https://issues.apache.org/jira/browse/SOLR-502 but looks it is : not in main release of solr1.4 note that issue is marked "Fix Version/s: 1.3" ... that means it was fixed in Solr 1.3, well before 1.4 came out. Yo

RE: Spellchecking and frequency

2010-07-27 Thread Dyer, James
Mark, I'd like to see your code if you open a JIRA for this. I recently opened SOLR-2010 with a patch that does something similar to the second part only of what you describe (find combinations that actually return a match). But I'm not sure if my approach is the best one so I would like to see

Re: Total number of terms in an index?

2010-07-27 Thread Michael McCandless
In trunk (flex) you can ask each segment for its unique term count. But to compute the unique term count across all segments is necessarily costly (requires merging them, to de-dup), as Hoss described. Mike On Tue, Jul 27, 2010 at 12:27 PM, Burton-West, Tom wrote: > Hi Jason, > > Are you lookin

RE: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox

2010-07-27 Thread David Thibault
Alessandro & all, I was having the same issue with Tika crashing on certain PDFs. I also noticed the bug where no content was extracted after upgrading Tika. When I went to the SOLR issue you link to below, I applied all the patches, downloaded the Tika 0.8 jars, restarted tomcat, posted a f

Re: Spellchecking and frequency

2010-07-27 Thread Mark Holland
Hi, I found the suggestions returned from the standard solr spellcheck not to be that relevant. By contrast, aspell, given the same dictionary and mispelled words, gives much more accurate suggestions. I therefore wrote an implementation of SolrSpellChecker that wraps jazzy, the java aspell libra

does this indicate a commit happened for every add?

2010-07-27 Thread Robert Petersen
I'm adding lots of small docs with several threads to solr and the adds start fast but then slow down. I didn't do any explicit commits and autocommit is turned off but the logs show lots of commit activity on this core and restarting this solr core logged the below. Where did all these commits c

SpatialSearch: sorting by distance

2010-07-27 Thread Pavel Minchenkov
Hi, I'm trying to sort by distance like this: sort=dist(2,lat,lon,55.755786,37.617633) asc In general results are sorted, but some documents are not in right order. I'm using DistanceUtils.getDistanceMi(...) from lucene spatial to calculate real distance after reading documents from Solr. Solr

RE: Total number of terms in an index?

2010-07-27 Thread Burton-West, Tom
Hi Jason, Are you looking for the total number of unique terms or total number of term occurrences? Checkindex reports both, but does a bunch of other work so is probably not the fastest. If you are looking for total number of term occurrences, you might look at contrib/org/apache/lucene/misc

Re: java "GC overhead limit exceeded"

2010-07-27 Thread Text Analysis
Look into -XX:-GCUseOverheadLimit On 7/26/10, Jonathan Rochkind wrote: > I am now occasionally getting a Java "GC overhead limit exceeded" error > in my Solr. This may or may not be related to recently adding much > better (and more) warming querries. > > I can get it when trying a 'commit', afte

Is it possible to get keyword/match's position?

2010-07-27 Thread Ryan Chan
According to SO: http://stackoverflow.com/questions/1557616/retrieving-per-keyword-field-match-position-in-lucene-solr-possible It is not possible, but it is one year ago, is it still true for now? Thanks.

RE: Querying throws java.util.ArrayList.RangeCheck

2010-07-27 Thread Manepalli, Kalyan
Hi Yonik, I am using Solr 1.4 release dated Feb-9 2010. There is no custom code. I am using regular out of box dismax requesthandler. The query is a simple one with 4 filter queries (fq's) and one sort query. During the index generation, I delete a set of rows based on date filter, then add new

RE: Spellcheck help

2010-07-27 Thread Dyer, James
If you could, let me know how your testing goes with this change. I too am interested in having the Collate work as good as it can. It looks like the code would be better with this change but then again I don't know what the original author was thinking when this was put in. James Dyer E-Comm

Highlighting parameters wiki

2010-07-27 Thread Stephen Green
The wiki entry for hl.highlightMultiTerm: http://wiki.apache.org/solr/HighlightingParameters#hl.highlightMultiTerm doesn't appear to be correct. It says: If the SpanScorer is also being used, enables highlighting for range/wildcard/fuzzy/prefix queries. Default is false. But the code in Defaul

Re: Russian stemmer

2010-07-27 Thread Robert Muir
right, but your problem is this is the current output: Ковров -> Ковр Коврову -> Ковров Ковровом -> Ковров Коврове -> Ковров so, if Ковров was simply left alone, all your forms would match... 2010/7/27 Oleg Burlaca > Thanks Robert for all your help, > > The idea of ы[A-Z].* stopwords is ideal

Re: Russian stemmer

2010-07-27 Thread Oleg Burlaca
Thanks Robert for all your help, The idea of ы[A-Z].* stopwords is ideal for the english language, although in russian nouns are inflected: Борис, Борису, Бориса, Борисом I'll try the RussianLightStemFilterFactory (the article in the PDF mentioned it's more accurate). Once again thanks, Oleg Bur

RE: Spellcheck help

2010-07-27 Thread Marc Ghorayeb
Thanks for the input, i'll check it out! Marc > Subject: RE: Spellcheck help > Date: Fri, 23 Jul 2010 13:12:04 -0500 > From: james.d...@ingrambook.com > To: solr-user@lucene.apache.org > > In org.apache.solr.spelling.SpellingQueryConverter, find the line (#84): > > final static String PATTERN =

Re: DIH : SQL query (sub-entity) is executed although variable is not set (null or empty list)

2010-07-27 Thread Chantal Ackermann
Hi Mitch, thanks for the code. Currently, I've got a different solution running but it's always good to have examples. > > If realized > > that I have to throw an exception and add the onError attribute to the > > entity to make that work. > > > I am curious: > Can you show how to make a meth

question: solrCloud with multiple cores on each machine

2010-07-27 Thread Yatir Ben Shlomo
Hi I am using solrCloud. Suppose I have a total 4 machines dedicated for solr. I want to have 2 machines as replication (salves) and 2 masters But I want to work with 8 logical cores rather 2. i.e. each master (and each slave) will have 4 cores on it. the reason is that I can optimize the cores on

Re: DIH : SQL query (sub-entity) is executed although variable is not set (null or empty list)

2010-07-27 Thread MitchK
Hi Chantal, instead of: /* multivalued, not required */ you do: /* multivalued, not required */ The yourCustomFunctionToReturnAQueryString(vip, querystring1, querystring2) { if(vip != n

Re: slave index is bigger than master index

2010-07-27 Thread Peter Karich
> We have three dedicated servers for solr, two for slaves and one for master, > all with linux/debian packages installed. > > I understand that replication does always copies over the index in an exact > form as in master index directory (or it is supposed to do that at least), > and if the mast

Re: DIH : SQL query (sub-entity) is executed although variable is not set (null or empty list)

2010-07-27 Thread Chantal Ackermann
Hi Mitch, > New idea: > Create a method which returns the query-string: > > returnString(theVIP) > { >if ( theVIP != null || theVIP != "") >{ >return "a query-string to find the vip" >} >else >{ >return "SELECT 1" // you nee

Re: LucidWorks 1.4 compilation

2010-07-27 Thread Eric Grobler
I did not realize the LucidWords.jar comes with an option to install the sources :-) On Tue, Jul 27, 2010 at 10:59 AM, Eric Grobler wrote: > Good Morning, afternoon or evening... > > If someone installed Solr using the LucidWorks.jar (1.4) installation how > can one make a small change and recomp

Re: DIH : SQL query (sub-entity) is executed although variable is not set (null or empty list)

2010-07-27 Thread MitchK
Hi Chantal, > However, with this approach indexing time went up from 20min to more > than 5 hours. > This is 15x slower than the initial solution... wow. >From MySQL I know that IN ()-clauses are the embodiment of endlessness - they perform very, very badly. New idea: Create a method which

Re: NullPointerException with CURL, but not in browser

2010-07-27 Thread Rene Rath
Ouch! Absolutely correct - quoting the URL fixed it. Thanks for saving me a sleepless night! cheers - rene 2010/7/26 Chris Hostetter > > : However, when I'm trying this very URL with curl within my (perl) script, > I > : receive a NullPointerException: > : CURL-COMMAND: curl -sL > : > http://lo

DIH $deleteDocByQuery

2010-07-27 Thread Maddy.Jsh
Hi, I have been using DIH to do index documents from database. I am hoping to use DIH to delete documents from index. I search in wiki and found the special commands in DIH to do so. http://wiki.apache.org/solr/DataImportHandler#Special_Commands But there is no example on how to use them. I tr

Re: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox

2010-07-27 Thread Alessandro Benedetti
Hi Jon, During the last days we front the same problem. Using Solr 1.4.1 classic (tika 0.4 ),from some pdf files we can't extract content and from others, Solr throws an exception during the Indexing Process . You must: Update tika libraries (into /contrib/extraction/lib)with tika-core.0.8 snapshot

LucidWorks 1.4 compilation

2010-07-27 Thread Eric Grobler
Good Morning, afternoon or evening... If someone installed Solr using the LucidWorks.jar (1.4) installation how can one make a small change and recompile. Is there a LucidWorks (tomcat) build somewhere? Regards ericz

Re: DIH : SQL query (sub-entity) is executed although variable is not set (null or empty list)

2010-07-27 Thread Chantal Ackermann
Hi Mitch, thanks for that suggestion. I wasn't aware of that. I've already added a temporary field in my ScriptTransformer that does basically the same. However, with this approach indexing time went up from 20min to more than 5 hours. The new approach is to query the solr index for that other d

Re: slave index is bigger than master index

2010-07-27 Thread Muneeb Ali
We have three dedicated servers for solr, two for slaves and one for master, all with linux/debian packages installed. I understand that replication does always copies over the index in an exact form as in master index directory (or it is supposed to do that at least), and if the master index wa

Re: clustering component

2010-07-27 Thread Stanislaw Osinski
Hi Matt, I'm attempting to get the carrot based clustering component (in trunk) to > work. I see that the clustering contrib has been disabled for the time > being. Does anyone know if this will be re-enabled soon, or even better, > know how I could get it working as it is? > I've recently create

clustering component

2010-07-27 Thread Matt Mitchell
Hi, I'm attempting to get the carrot based clustering component (in trunk) to work. I see that the clustering contrib has been disabled for the time being. Does anyone know if this will be re-enabled soon, or even better, know how I could get it working as it is? Thanks, Matt

Re: Russian stemmer

2010-07-27 Thread Robert Muir
2010/7/27 Oleg Burlaca > Actually the situation with Немцов из ок, > I've just checked how Yandex works with Немцов and Немцова: > http://nano.yandex.ru/project/inflect/ > > I think there are two solutions: > a) manually search for both Немцов and then Немцова > b) use wildcard query: Немцов* >

Re: Russian stemmer

2010-07-27 Thread Oleg Burlaca
Actually the situation with Немцов из ок, I've just checked how Yandex works with Немцов and Немцова: http://nano.yandex.ru/project/inflect/ I think there are two solutions: a) manually search for both Немцов and then Немцова b) use wildcard query: Немцов* Robert, thanks for the RussianLightStemF

Re: Russian stemmer

2010-07-27 Thread Oleg Burlaca
A similar word is Немцов. The strange thing is that searching for "Немцова" will not find documents containing "Немцов" Немцова: 14 articles http://www.sova-center.ru/search/?lg=1&q=%D0%BD%D0%B5%D0%BC%D1%86%D0%BE%D0%B2%D0%B0 Немцов: 74 articles http://www.sova-center.ru/search/?lg=1&q=%D0%BD%D0%B

Re: Russian stemmer

2010-07-27 Thread Oleg Burlaca
Yes, I'm sure I've enabled SnowballPorterFilterFactory both at Index and Query time, because the search works ok, except names and geo locations. I've noticed that searching by Коврова also shows documents that contain Коврову, Коврове Search by Ковров, 7 results: http://www.sova-center.ru/searc

Re: Russian stemmer

2010-07-27 Thread Robert Muir
another look, your problem is ковров itself... its mapped to ковр a workaround might be to use the protected words functionality to keep ковров and any other problematic people/geo names as-is. separately, in trunk there is an alternative russian stemmer (RussianLightStemFilterFactory), which mig

Spellchecking and frequency

2010-07-27 Thread dan sutton
Hi, I've recently been looking into Spellchecking in solr, and was struck by how limited the usefulness of the tool was. Like most corpora , ours contains lots of different spelling mistakes for the same word, so the 'spellcheck.onlyMorePopular' is not really that useful unless you click on it nu

Re: Russian stemmer

2010-07-27 Thread Robert Muir
All of your examples stem to "ковров": assertAnalyzesTo(a, "Коврова Коврову Ковровом Коврове", new String[] { "ковров", "ковров", "ковров", "ковров" }); } Are you sure you enabled this at *both* index and query time? 2010/7/27 Oleg Burlaca > Hello, > > I'm using SnowballPorter

Russian stemmer

2010-07-27 Thread Oleg Burlaca
Hello, I'm using SnowballPorterFilterFactory with language="Russian". The stemming works ok except people names, geographical places. Here are some examples: searching for Ковров should also find Коврова, Коврову, Ковровом, Коврове. Are there other stemming plugins for the russian language that

Any tips/guidelines to turning the Solr/luence performance in a master/slave/sharding environment

2010-07-27 Thread Chengyang
How to reduce the index files size, decreate the sync time between each nodes. decrease the index create/update time. Thanks.

Re: How to Combine Drupal solrconfig.xml with Nutch solrconfig.xml?

2010-07-27 Thread David Stuart
I would use the string version as Drupal will probably populate it with a url like thing something that may not validate as type url On 27 Jul 2010, at 04:00, Savannah Beckett wrote: > > I am trying to merge the schema.xml that is the solr/nutch setup with the one > from drupal apache solr mo

Re: Design questions/Schema Help

2010-07-27 Thread Chantal Ackermann
Hi, IMHO you can do this with date range queries and (date) facets. The DateMathParser will allow you to normalize dates on min/hours/days. If you hit a limit there, then just add a field with an integer for either min/hour/day. This way you'll loose the month information - which is sometimes what