Re: NGramFilterFactory for auto-complete that matches the middle of multi-lingual tags?

2010-10-04 Thread Andy
> > 1) hyphens - if user types "ema" or "e-ma" I want to > > suggest "email" > > > > 2) accents - if user types "herme"  want to suggest > > "Hermès" > > Accents can be removed with using MappingCharFilterFactory > before the tokenizer. (both index and query time) > > mapping="mapping-ISOLatin1

Re: multi level faceting

2010-10-04 Thread Allistair Crossley
I think that is just sending 2 fq facet queries through. In Solr PHP I would do that with, e.g. $params['facet'] = true; $params['facet.fields'] = array('Size'); $params['fq'] => array('sex' => array('Men', 'Women')); but yes i think you'd have to send through what the current facet query is and

Re: Prioritizing advectives in solr search

2010-10-04 Thread Otis Gospodnetic
Hi Hasnain, You'll need to apply POS (Part of Speech) on the input at/before indexing, then store a payload with your adjective terms, and finally use of those payload values to change the scoring at query time. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosyste

RE: DIH sub-entity not indexing

2010-10-04 Thread Ephraim Ofir
The closest you can get to debugging (without actually debugging...) is to look at the logs and use http://wiki.apache.org/solr/DataImportHandler#Interactive_Development_Mo de Ephraim Ofir -Original Message- From: Allistair Crossley [mailto:a...@roxxor.co.uk] Sent: Monday, October 04, 2

Re: Prioritizing advectives in solr search

2010-10-04 Thread Hasnain
Hi Otis, Thank you for replying, unfortunately Im unable to fully grasp what you are trying to say, can you please elaborate what is payload with adjective terms? also Im using stopwords.txt to stop adjectives, adverbs and verbs, now when I search for "Blue hammers", solr searches for "

Re: Multiple masters and replication between masters?

2010-10-04 Thread Upayavira
On Mon, 2010-10-04 at 00:25 +0530, Arunkumar Ayyavu wrote: > I'm looking at setting up multiple masters for redundancy (for index > updates). I found the thread in this link > (http://www.lucidimagination.com/search/document/68ac303ce8425506/multiple_masters_solr_replication_1_4) > discussed this

Re: DIH sub-entity not indexing

2010-10-04 Thread Allistair Crossley
Hey, Yes that tool doesn't work too well for me. I can load it up and get the forms on the left, but when I run a debug the right hand side tells me that the page is not found. I *think* this is because I use a custom query string parameter in my DIH XML for use with delta querying and this bei

RE: DIH sub-entity not indexing

2010-10-04 Thread Ephraim Ofir
Make sure you're not running into a case sensitivity problem, some stuff in DIH is case sensitive (and some stuff gets capitalized by the jdbc). Try using listing.ID instead of listing.id. On a side note, if you're using mysql, you might want to look at the CONCAT_WS function. You might also want t

multi level faceting

2010-10-04 Thread Nguyen, Vincent (CDC/OD/OADS) (CTR)
Hi, I was wondering if there's a way to display facet options based on previous facet values. For example, I've seen many shopping sites where a user can facet by "Mens" or "Womens" apparel, then be shown "sizes" to facet by (for Men or Women only - whichever they chose). Is this somethi

Re: DIH sub-entity not indexing

2010-10-04 Thread Stefan Matheis
Allistair, Indeed, I have a column that already works using a lower-case id. I wish > I could debug it somehow - see the SQL? Something particular about this > config it is not liking. > you may want to try the MySQL Query-Log, to check which Queries are performed? http://dev.mysql.com/doc/refman

Re: NGramFilterFactory for auto-complete that matches the middle of multi-lingual tags?

2010-10-04 Thread Andy
> I got your point. You want to retrieve "electric吉他" > with the query 吉他. That's why you don't want EdgeNGram. > If this is the only reason for NGram, I think you can > transform "electric吉他" into two tokens "electric" > "吉他" in TokenFilter(s) and apply EdgeNGram approach. > What TokenFilters

Re: DIH sub-entity not indexing

2010-10-04 Thread Allistair Crossley
Thanks Ephraim. I tried your suggestion with the ID but capitalising it did not work. Indeed, I have a column that already works using a lower-case id. I wish I could debug it somehow - see the SQL? Something particular about this config it is not liking. I read the post you linked to. This i

DIH sub-entity not indexing

2010-10-04 Thread Allistair Crossley
Hello list, I've been successful with DIH to a large extent but a seemingly simple extra column I need is posing problems. In a nutshell I have 2 entities let's say - Listing habtm Contact. I have copied the relevant parts of the configs below. I have run my SQL for the sub-entity Contact and t

Re: UpdateXmlMessage

2010-10-04 Thread Tod
On 10/1/2010 11:33 PM, Lance Norskog wrote: Yes. stream.file and stream.url are independent of the request handler. They do their magic at the very top level of the request. However, there are no unit tests for these features, but they are widely used. Sorry Lance, are you agreeing that I can

Re: DIH sub-entity not indexing

2010-10-04 Thread Allistair Crossley
Very clever thinking indeed. Well, that's certainly revealed the problem ... ${listing.id} is empty on my sub-entity query ... And this because I prefix the indexed ID with a letter This appears to modify the internal value of $listing.id for subsequent uses. Well, I can work around this now

Re: NGramFilterFactory for auto-complete that matches the middle of multi-lingual tags?

2010-10-04 Thread Ahmet Arslan
> What TokenFilters would split "electric吉他" into > "electric" & "吉他"? Is it possible to write a regex to capture Chinese text? (Unicode range?) If yes, you can use PatternReplaceFilter to transform electric吉他 into electric_吉他. After that WordDelimeterFilterFactory can produce two adjacent to

Re: solr-user

2010-10-04 Thread Erick Erickson
I suspect you're not actually including the path to those jars. SolrException should be in your solrj jar file. You can test this by executing "jar -tf apacheBLAHBLAH.jar" which will dump all the class names in the jar file. I'm assuming that you're really including the version for the * in the sol

Re: NGramFilterFactory for auto-complete that matches the middle of multi-lingual tags?

2010-10-04 Thread Ahmet Arslan
> Does anyone know how to deal with these 2 issues when using > NGramFilterFactory for autocomplete? > > 1) hyphens - if user types "ema" or "e-ma" I want to > suggest "email" > > 2) accents - if user types "herme"  want to suggest > "Hermès" Accents can be removed with using MappingCharFilterFa

More like this and terms positions

2010-10-04 Thread Xavier Schepler
Hi, does the more like this search uses terms positions information in the score formula ?

Re: solr-user

2010-10-04 Thread Allistair Crossley
I updated the SolrJ JAR requirements to be clearer on the wiki page given how many of these SolrJ emails I saw coming through since joining the list. I just created a test java class and imported the removed JARs until I found out the minimal set required. On Oct 4, 2010, at 8:27 AM, Erick Eric

Re: NGramFilterFactory for auto-complete that matches the middle of multi-lingual tags?

2010-10-04 Thread Ahmet Arslan
> I agree with the issues with NGramFilterFactory you pointed > out and I really want to avoid using it. But the problem is > that I have Chinese tags like "电吉他" and multi-lingual > tags like "electric吉他". I got your point. You want to retrieve "electric吉他" with the query 吉他. That's why you don't

Re: DIH sub-entity not indexing

2010-10-04 Thread Allistair Crossley
I have tried a more elaborate join also following the features example of the DIH example but same result - SQL works fine directly but Solr is not indexing the array of full_names per Listing, e.g. Am I missin

Re: Autosuggest with inner phrases

2010-10-04 Thread Otis Gospodnetic
Or, , this: http://www.sematext.com/products/autocomplete/index.html , which happens to use the same "bass" examples as the original poster. :) You can see this Autosuggest in action on http://search-lucene.com/ . Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Origi

Re: Highlighting match term in bold rather than italic

2010-10-04 Thread Otis Gospodnetic
Hi, It's a matter of the config. Have a look at the highlighter section of solrconfig.xml. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: "efr...@gmail.com" > To: solr-user@lucene.

Re: More like this and terms positions

2010-10-04 Thread Robert Muir
On Mon, Oct 4, 2010 at 10:16 AM, Xavier Schepler < xavier.schep...@sciences-po.fr> wrote: > Hi, > > does the more like this search uses terms positions information in the > score formula ? > no, it would be nice if it did use them though (based upon query terms), seems like it would yield improve

Re: Multiple masters and replication between masters?

2010-10-04 Thread Otis Gospodnetic
Hi, Would this help you: http://wiki.apache.org/solr/SolrReplication#Setting_up_a_Repeater Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Arunkumar Ayyavu > To: solr-user@lucene.apa

Re: More like this and terms positions

2010-10-04 Thread Xavier Schepler
On 04/10/2010 16:40, Robert Muir wrote: On Mon, Oct 4, 2010 at 10:16 AM, Xavier Schepler< xavier.schep...@sciences-po.fr> wrote: Hi, does the more like this search uses terms positions information in the score formula ? no, it would be nice if it did use them though (based upon qu

RE: solrj

2010-10-04 Thread Xin Li
I asked the exact question the day before. If you or anyone else has pointer to the solution, please share on the mail list. For now, I am using Perl script instead to query Solr server. Thanks, Xin -Original Message- From: ankita shinde [mailto:ankitashinde...@gmail.com] Sent: Saturday,

Re: solrj

2010-10-04 Thread Allistair Crossley
i rewrote the top jar section at http://wiki.apache.org/solr/Solrj and the following code then runs fine. import java.net.MalformedURLException; import org.apache.solr.client.solrj.SolrQuery; import org.apache.solr.client.solrj.SolrServer; import org.apache.solr.client.solrj.SolrServerException;

RE: multi level faceting

2010-10-04 Thread Nguyen, Vincent (CDC/OD/OADS) (CTR)
Ok. Thanks for the quick response. Vincent Vu Nguyen Division of Science Quality and Translation Office of the Associate Director for Science Centers for Disease Control and Prevention (CDC) 404-498-6154 Century Bldg 2400 Atlanta, GA 30329 -Original Message- From: Allistair Crossley [m

RE: solrj

2010-10-04 Thread Xin Li
Thanks, Allistair. I will give it a try later today. -Original Message- From: Allistair Crossley [mailto:a...@roxxor.co.uk] Sent: Monday, October 04, 2010 11:31 AM To: solr-user@lucene.apache.org Subject: Re: solrj i rewrote the top jar section at http://wiki.apache.org/solr/Solrj an

RE: multi level faceting

2010-10-04 Thread Jason Brown
Yes, by adding fq back into the main query you will get results increasingly filtered each time. You may run into an issue if you are displaying facet counts, as the facet part of the query will also obey the increasingly filtered fq, and so not display counts for other categories anymore from

Dismax Filtering Hyphens? Why is this not working? How do I debug Dismax?

2010-10-04 Thread Scott Gonyea
Wow, this is probably the most annoying Solr issue I've *ever* dealt with. First question: How do I debug Dismax, and its query handling? Issue: When I query against this StrField, I am attempting to do an *exact* match... Albeit one that is case-insensitive :). So, 90% exact. It works in a maj

having problem about Solr Date Field.

2010-10-04 Thread Kouta Osabe
Hi,All I have a problem about Solr Date Field. The problem is like below. SolrBean foo = new Bean(); // The type of pubDate property is "java.util.Date" and rs means "java.sql.ResultSet" so rs.getDate("pub_date") retuns java.sql.Date Object. bean.pubDate = rs.getDate("pub_date"); the value of p

Re: Dismax Filtering Hyphens? Why is this not working? How do I debug Dismax?

2010-10-04 Thread Ahmet Arslan
> >     name="idstr"   class="solr.StrField"> >       >         class="solr.PatternTokenizerFactory" pattern="(.*)" > group="1"/> >           class="solr.LowerCaseFilterFactory"/> >       This definition is invalid. You cannot use charfilter/tokenizer/tokenfilter with solr.StrField. But it

Re: having problem about Solr Date Field.

2010-10-04 Thread Gora Mohanty
On Mon, Oct 4, 2010 at 10:24 PM, Kouta Osabe wrote: > Hi,All > > I have a problem about Solr Date Field. [...] > the value of pub_date column comes from MySQL and actually value is > "2010-10-05 00:00:00". > > I regist "foo" bean to Solr through SolrJ like "new > CommonsHttpSolrServer().addBean(f

Re: having problem about Solr Date Field.

2010-10-04 Thread Ahmet Arslan
> I expected "2010-10-05 00:00:00" to display by Solr Admin > but > "2010-10-04T15:00:00Z" displayed on Solr Admin. > > is this timezone problem?(I live in Tokyo Japan). Probably. Solr stores/converts dates in/to UTC timezone.

Re: Dismax Filtering Hyphens? Why is this not working? How do I debug Dismax?

2010-10-04 Thread Scott Gonyea
Wow, that's pretty infuriating. Thank you for the suggestion. I added it to the Wiki, with the hope that if it contains misinformation then someone will correct it and, consequently, save me from another one of these experiences :) (...and to also document that, hey, there is a tokenizer which t

Re: SolrCore / Index Searcher Instances

2010-10-04 Thread entdeveloper
Make sense. However, one of the reasons I was asking was that we've configured Solr to use RAMDirectory and it appears that it loads the index into memory twice. I suspect the first time is for warming firstSearcher and the second time is for warming newSearcher. It makes our jvm memory requiremen

How to update a distributed index?

2010-10-04 Thread bbarani
Hi, We are maintaining multiple SOLR index, one for each source (the source data is too huge to be stored in a single index) and we are using shards to do a distributed search across all the SOLR index. We also update the SOLR documents (which was already indexed) using XML push http://server:8

Using Solr Analyzers in Lucene

2010-10-04 Thread Max Lynch
Hi, I asked this question a month ago on lucene-user and was referred here. I have content being analyzed in Solr using these tokenizers and filters: Basically I want to be able to search against

Re: Using Solr Analyzers in Lucene

2010-10-04 Thread Max Lynch
I have made progress on this by writing my own Analyzer. I basically added the TokenFilters that are under each of the solr factory classes. I had to copy and paste the WordDelimiterFilter because, of course, it was package protected. On Mon, Oct 4, 2010 at 3:05 PM, Max Lynch wrote: > Hi, >

Some Parser Resources / Links, and some related questions

2010-10-04 Thread Mark Bennett
I'm fiddling around with DisMax again, my love/have relationship continues. Basically I'm hunting for custom parser examples. But I've also got some links that might be helpful to others. Alternate XML Syntax: Project: XML Query Syntax is being worked on, in SOLR-839 Link: https://issues.apache.

Re: Solr with example Jetty and score problem

2010-10-04 Thread Floyd Wu
Hi Chris Thanks. But do you have any suggest or work-around to deal with it? Floyd 2010/10/2 Chris Hostetter > > : But when I issue the query with shard(two instances), the response XML > will > : be like following. > : as you can see, that score has bee tranfer to a element of >..

Different between Lucid dist. & Apache dist. ?

2010-10-04 Thread Floyd Wu
Hi there, What is the difference between Lucid distribution of Solr and Apache distribution? And can I use Lucid distribution for free in my commercial project?

Re: How to update a distributed index?

2010-10-04 Thread Otis Gospodnetic
Hi, Even with the sharded index you typically go to the master(s) that has/have all your shards and update the appropriate shard (in your case, based on the doc type). Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/   -

Re: Prioritizing advectives in solr search

2010-10-04 Thread Otis Gospodnetic
Hi, If you want "blue" to be used in search, then you should not treat it as a stopword. Re payloads: http://search-lucene.com/?q=payload+score and http://search-lucene.com/?q=payload+score&fc_type=wiki (even better, look at hit #1) Otis Sematext :: http://sematext.com/ :: Solr - Lucene

Re: Prioritizing advectives in solr search

2010-10-04 Thread Walter Underwood
I think this is a bad idea. The tf.idf algorithm will already put a higher weight on "hammers" than on "blue", because "hammers" will be more rare than "blue". Plus, you are making huge assumptions about the queries. In a search for "Canon camera", "Canon" is an adjective, but it is the importan

Re: multi level faceting

2010-10-04 Thread Otis Gospodnetic
Hi, I *think* this is not what Vincent was after. If I read the suggestions correctly, you are saying to use &fq=x&fq=y -- multiple fqs. But I think Vincent is wondering how to end up with something that will let him create a UI with multi-level facets (with a single request), e.g. Footwear (1

Differences between FilterFactory and TokenizerFactory?

2010-10-04 Thread Andy
There are EdgeNGramFilterFactory & EdgeNGramTokenizerFactory. Likewise there are StandardFilterFactory & StandardTokenizerFactory. LowerCaseFilterFactory & LowerCaseTokenizerFactory. Seems like they always come in pairs. What are the differences between FilterFactory and TokenizerFactory? When

ant build problem

2010-10-04 Thread satya swaroop
Hi all, i updated my solr trunk to revision 1004527. when i go for compiling the trunk with ant i get so many warnings, but the build is successful. the warnings are here::: common.compile-core: [mkdir] Created dir: /home/satya/temporary/trunk/lucene/build/classes/java [javac] Compi

Solr admin level configurations for production

2010-10-04 Thread Siva Prasad Janapati
Hi, We configured solr search for huge data(~6 million records).For increasing the seach performance, is there any admin level configurations do we need to do.Please suggest some of the admin level settings. Regards, Siva

Numeric search in text field

2010-10-04 Thread javaxmlsoapdev
Hello, I have a string "Marsh 1" (no quotes while searching). If I put "Marsh 1" in the search box with no quotes I get expected results back but when I search for just "1" (again no quotes) I don't get any results back. I use WorldDelimiterFactory as follow. Any idea? --

Re: Solr UIMA integration

2010-10-04 Thread maheshkumar
Hi Tommaso, I have register in the both sites and got the api keys. But i am getting a new error. Oct 4, 2010 6:15:04 PM org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl callAnalysisComponentProcess(405) SEVERE: Exception occurred org.apache.uima.analysis_engine.AnalysisEnginePr

RE: multi level faceting

2010-10-04 Thread Ephraim Ofir
Take a look at "Mastering the Power of Faceted Search with Chris Hostetter" (http://www.lucidimagination.com/solutions/webcasts/faceting). I think there's an example of what you're looking for there. Ephraim Ofir -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.co

Tuning Solr

2010-10-04 Thread Floyd Wu
Hi there, If I dont need Morelikethis, spellcheck, highlight. Can I remove this configuration section in solrconfig.xml? In other workd, does solr load and use these SearchComponet on statup and suring runtime? Remove this configuration will or will not speedup query? Thanks

Begins with and ends with word

2010-10-04 Thread Maddy.Jsh
Hi, I have 2 documents with following values. Doc1 Subject: Weekly transport Doc2 Subject: Week report on transportation I need to search documents in 4 formats 1. Begins with “week” It should return documents which has "week" as first word, i.e. doc1 2. Begins with “week*” It shoul