Re: Improving Solr Spell Checker Results

2012-01-20 Thread David Radunz
Hey, Thanks so much for your outstanding response. I have been buisy for a few days so have not had a chance to try it out. I have now tried to install trunc of solr and when i run 'ant test' I encounter the following: [junit] Testsuite: org.apache.lucene.facet.taxonomy.directory.Tes

Re: Question on Reverse Indexing

2012-01-20 Thread Dmitry Kan
Shyam, The thing is that in order to use the leading wildcard, you don't necessarily need to use ReversedWildcardFilterFactory, there is another way to do this, which was turned off by default due to its inefficiency for the case of big term dictionaries. Not sure if this has changed in solr 4.0 t

RE: Tika0.10 language identifier in Solr3.5.0

2012-01-20 Thread nibing
Hi, Ted Dunning, Thank you for your reply. I can understand your point on putting a "language_s" field and then keeping all the files together, which speed-up searching. But then there occurs a problem of using analyzer in indexing. I assume files encoded in different language should be ha

Re: Searching partial phone numbers

2012-01-20 Thread marotosg
Hi. I found the solutions for that. You can apply a new filter for that field. It´s possible to define a type text field with a new filter ** That means you will generate the reverse of the phone number. For instance 08774589 and after that the reverse is 98547780. Becaouse * only works at the

highlight result given by surround query parser

2012-01-20 Thread manyutomar
i am using surround parser to perform span queries and getting the required result ,but i want to highlight the term in result set and highlighter i guess does not support surround query parser . Are their any plugin or patches available to do the same . i guess highlighting should use surround que

Search within words

2012-01-20 Thread jawedshamshedi
Hi all, I want to search a string in the same way as mysql like does i.e. if the word is say Sunrise then if i search rise then again sun rise should come and if I choose sun then again sunrise should come and if I use sunrise then again sunrise should come in search. the search should no be cas

Re: How to import data from xml files to solr

2012-01-20 Thread Jan Høydahl
Hi, Note that there is yet another option, the XSLT UpdateRequestHandler http://wiki.apache.org/solr/XsltUpdateRequestHandler (which obviously needs better documentation). It can take arbirarty XML in, along with a stylesheet for transformation, and voila :) I made a stylesheet to import sear

Wildcard query with uppercase characters gets no result in edismax handler

2012-01-20 Thread Matthias Müller
Hi, I'm using an edismax handler All fields and queries are lower case (LowerCaseFilterFactory in schema.xml) Queries for television, Television and televisio* lead to results. But Televisio* has no result. Is this a bug, a feature or a misconfiguration? Kind Regards Matthias

Re: Just can't get Solritas to work, help!

2012-01-20 Thread remi tassing
So erase my solr folder and started from scratch. >From the example folder I "java -jar start.jar" but there was a solrconfig.xml missing. I copied this file from Solr-3.4.0 to my Solr-3.5.0 folder. Now http://localhost:8983/solr/admin works but http://localhost:8983/solr/browse gives me this res

Re: Wildcard query with uppercase characters gets no result in edismax handler

2012-01-20 Thread Tomás Fernández Löbbe
You'll get this same behavior with edismax or lucene QP. Wildcard queries are not analyzed (not the lowercase filter nor any other). 2012/1/20 Matthias Müller > Hi, > > I'm using an edismax handler > All fields and queries are lower case (LowerCaseFilterFactory in > schema.xml) > > Queries for

How to get the time document was indexed?

2012-01-20 Thread ola nowak
Hi, I want to be able to tell when the document was indexed, so I could re-index it if it has changed in the meantime. Is there an easy way to do this? Or I have to manualy put the date in the document and add a new field in schema? Thanks, Alex

Re: How to get the time document was indexed?

2012-01-20 Thread Tommaso Teofili
Hi Alex, you can create a field in the schema.xml of type date or tdate called (something like) idx_timestamp and set its default option to NOW then you won't have to add any extra fields to the documents because it will be automatically created when documents are indexed. Hope it helps. Tommaso 2

Re: HIbernate Search and SOLR Integration

2012-01-20 Thread Anderson vasconcelos
Otis, The DataImportHandler is not only for import data from database? I don't wanna to import data from database. I just wanna to persist the object in my database and after send this saved object to SOLR. When the user find some document using the SOLR search, i need to return this persistent ob

restrict fuzzy search to longer words

2012-01-20 Thread Lance
HI, Could you please help me with a quick question - Is there a way to restrict lucene/solr fuzzy search to only analyze words that have more than 5 characters and to ignore words with less than that (i.e. less than 6 character words)? Thanks - Lance

Re: Does it make sense to configure newSearcher and firstSearcher on a Solr Master instance?

2012-01-20 Thread Daniel Brügge
Erick, yes, currently I have 6 shards, which accept writes and reads. Sometimes I delete data from all 6 and try to balance them, fill them up respectively, so they have approx. the same amount of data on it. So all 6 are 'in motion' somehow. I would like that the writing would take place more oft

Re: How to get the time document was indexed?

2012-01-20 Thread Hector Castro
As Tommaso said, adding a field to the schema.xml gives you an automatic timestamp set at index time. The default schema.xml with Solr 3.5.0 has a commented example: -- Hector On Jan 20, 2012, at 8:15 AM, Tommaso Teofili wrote: > Hi Alex, > you can create a field in the schema.xml o

Re: Search within words

2012-01-20 Thread Otis Gospodnetic
Hello, You can accomplish this by using n-grams or edge n-grams, which you'll use as field types for fields where you want such matching to occur and that you will specify in schema.xml.  I hope this helps. Otis  Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performa

Re: How to accelerate your Solr-Lucene appication by 4x

2012-01-20 Thread Otis Gospodnetic
That's valuable info there. :) So then I wonder which of the two, RAM or SSD, has a more favorable price/size trajectory... Otis  Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html - Original Message - > From: Ted Dunning > To:

RE: How to accelerate your Solr-Lucene appication by 4x

2012-01-20 Thread Peter Velikin
Ted, Otis, Thanks for the info. I’ll take a stab at answering your question. RAM: Both of you are correct that if you were able to keep your index in RAM, that would give you the fastest results. This works if you have a small enough index. At ZoomInfo, the index was 600 GB (they have mu

Re: HIbernate Search and SOLR Integration

2012-01-20 Thread Otis Gospodnetic
Hi, If you save all fields you want to display in search results, then you don't need to go to the database at search time. If you do not save all fields you want to display in search results, then you will need to first query Solr, get IDs of all matches you want to display, and then from your

Re: Tika0.10 language identifier in Solr3.5.0

2012-01-20 Thread Otis Gospodnetic
Ni, Bing I believe you will need to pre-define fields for all languages you want to handle and specify an appropriate language-specific analyzer for each of those fields. This also means that if you encounter a new language, you will need to adjust your schema to support a new language.  Of cou

How to get the time document was indexed?

2012-01-20 Thread ola nowak
Hi, I want to be able to tell when the document was indexed, so I could re-index it if it has changed in the meantime. Is there an easy way to do this? Or I have to manualy put the date in the document and add a new field in schema? Thanks, Alex

Re: HIbernate Search and SOLR Integration

2012-01-20 Thread Anderson vasconcelos
Ok. I thought there was an easier way to do this using hibernate search. I will make this manually. Thanks for help 2012/1/20 Otis Gospodnetic > Hi, > > If you save all fields you want to display in search results, then you > don't need to go to the database at search time. > If you do not sa

Re: How to accelerate your Solr-Lucene appication by 4x

2012-01-20 Thread Ted Dunning
It sounds bad with a 600GB index, but the techniques in the UMass achieve a substantial compression of the in-memory size (remember that only part of the index needs to be memory resident). If you assume that you get 2x compression from compression and elision then you only need 3-5 fat-memory mac

Re: Tika0.10 language identifier in Solr3.5.0

2012-01-20 Thread Ted Dunning
Write a tokenizer that does language ID and then picks which tokenizer to use. Then record the language in the language id field. What is there to elaborate? On Fri, Jan 20, 2012 at 1:58 AM, nibing wrote: > But then there occurs a problem of using analyzer in indexing. I assume > files encoded

Re: Tika0.10 language identifier in Solr3.5.0

2012-01-20 Thread Ted Dunning
Otis, Can you say why there needs to be a field per language? Why not have a polyglot analyzer? On Fri, Jan 20, 2012 at 7:29 AM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > Ni, Bing > > I believe you will need to pre-define fields for all languages you want to > handle and specify a

EdgeNGramTokenizer not working

2012-01-20 Thread mmg
Hi, I'm trying to define an EdgeNGram field but for some reason it doesn't work. My fieldType definition is:

Re: EdgeNGramTokenizer not working

2012-01-20 Thread Rafał Kuć
Hello! Do you use the 'text' field for searching or the 'name' field ? Remember that, when you use copyField the data that is copied is the original data, not the analyzed one. -- Regards, Rafał Kuć > Hi, > I'm trying to define an EdgeNGram field but for some reason it doesn't work. > My fiel

Re: Wildcard query with uppercase characters gets no result in edismax handler

2012-01-20 Thread Erick Erickson
This is fixed for many cases in 3.6 (i.e. current but unreleased 3.x code line) and trunk, see: https://issues.apache.org/jira/browse/SOLR-2438 Best Erick 2012/1/20 Tomás Fernández Löbbe : > You'll get this same behavior with edismax or lucene QP. Wildcard queries > are not analyzed (not the lowe

Re: EdgeNGramTokenizer not working

2012-01-20 Thread mmg
Thank you for your reply. I think that's probably the problem then. Is there any way I can do this: I have a list of programs. Each program has a name, keywords, description and username. When I perform a search, I need to search all of those fields at once. This is why I used a copyfield to copy e

Re: Does it make sense to configure newSearcher and firstSearcher on a Solr Master instance?

2012-01-20 Thread Erick Erickson
There will be some increase pressure on your resources when replication happens to the slaves. That said, you can also allocate resources differently between the two. For instance, you do not need any memory for the RAMBuffer on the slaves since you're not indexing. On the master, you don't need an

Phonetic search for portuguese

2012-01-20 Thread Anderson vasconcelos
Hi The phonetic filters (DoubleMetaphone, Metaphone, Soundex, RefinedSoundex, Caverphone) is only for english language or works for other languages? Have some phonetic filter for portuguese? If dont have, how i can implement this? Thanks

Re: EdgeNGramTokenizer not working

2012-01-20 Thread Rafał Kuć
Hello! Look at the dismax (http://wiki.apache.org/solr/DisMaxQParserPlugin) query parser and the qf parameter. With dismax (or edismax) you can make a query like: q=user query&qf=name keywords description username and Solr will make the query to all the fields specified by the qf parameter. --

Re: How to accelerate your Solr-Lucene appication by 4x

2012-01-20 Thread Erick Erickson
Peter: I admit I've just scanned the thread, but it sounds like what you're really doing under the covers is configuring your system to utilize the SSDs as where your pages go when it's swapped out of RAM, is this correct? Which would certainly speed things up substantially if swapping was happen

Re: Tika0.10 language identifier in Solr3.5.0

2012-01-20 Thread Erick Erickson
bq: Why not have a polyglot analyzer That could work, but it makes some compromises and assumes that your languages are "close enough", I have absolutely no clue how that would work for English and Chinese say. But it also introduces inconsistencies. Take stemming. Even though you could easily st

Re: EdgeNGramTokenizer not working

2012-01-20 Thread mmg
That looks like a good solution. I'm pretty new with Solr, so I'm not sure how I should implement it. I looked at the documentation and I *think* I need to modify the search requestHandler in the solrconfi.xml file, is this correct? If I define it like this: explicit 10 edism

RE: How to accelerate your Solr-Lucene appication by 4x

2012-01-20 Thread Peter Velikin
Hi Erick, This is correct. An additional benefit to configuring the SSD as cache vs primary storage is that you don't have to change anything to your existing indexes (the cache will just give a performance boost). In addition to configuring the system to utilize SSDs as the location where

Re: EdgeNGramTokenizer not working

2012-01-20 Thread Rafał Kuć
Hello! I think it should work with SolrJ, it shouldn't be a problem. You don't have to modify the handler, you can specify those parameters at query time. But if you won't change it and those will be constant, you can modify the sorlconfig.xml file. And you don't have to remove the defaultSearc

Re: Tika0.10 language identifier in Solr3.5.0

2012-01-20 Thread Ted Dunning
I think you misunderstood what I am suggesting. I am suggesting an analyzer that detects the language and then "does the right thing" according to the language it finds. As such, it would tokenize and stem English according to English rules, German by German rules and would probably do a sliding

Re: EdgeNGramTokenizer not working

2012-01-20 Thread mmg
Thanks a lot for your help. I'll try it out. I didn't see a defType on the SolrQuery object, this is why I tought it had to be set in the config. Or is queryType the same as defType? -- View this message in context: http://lucene.472066.n3.nabble.com/EdgeNGramTokenizer-not-working-tp3675926p36760

Re: Just can't get Solritas to work, help!

2012-01-20 Thread remi tassing
The tutorial works with Solr-3.4.0! Should the tutorial be updated with newer versions? Remi On Friday, January 20, 2012, remi tassing wrote: > So erase my solr folder and started from scratch. > From the example folder I "java -jar start.jar" but there was a solrconfig.xml missing. I copied th

Sort for Retrieved Data

2012-01-20 Thread Bing Li
Dear all, I have a question when sorting retrieved data from Solr. As I know, Lucene retrieves data according to the degree of keyword matching on text field (partial matching). If I search data by string field (complete matching), how does Lucene sort the retrieved data? If I add some filters,

RE: Different mm for spellcheckquery

2012-01-20 Thread Dyer, James
I thought of a way you could do this with one query, if using edismax. If you use "spellcheck.q" and insert "AND" between each keyword you'll make all the terms required regardless of the "mm" parameter. I quickly tried this out and it seems to work if you use "AND" but not if you prefix all t

Getting a word count frequency out of a page field

2012-01-20 Thread solr user
SOLR reports the term occurrence for terms over all the documents. I am having trouble making a query that returns the term occurrence in a specific page field called, documentPageId. I don't know how to issue a proper SOLR query that returns a word count for a paragraph of text such as the term "

Re: Just can't get Solritas to work, help!

2012-01-20 Thread Erik Hatcher
On Jan 20, 2012, at 13:23 , remi tassing wrote: > The tutorial works with Solr-3.4.0! It works for 3.5 too... via Jetty as prescribed by the tutorial. No? > Should the tutorial be updated with newer versions? Have you tried the instructions here? http://www.lucidimagination.com/search/doc

RE: Just can't get Solritas to work, help!

2012-01-20 Thread Steven A Rowe
Erik, I've already backported SOLR-2718 - is that what you were referring to when you said you would fix 3.6? Steve > -Original Message- > From: Erik Hatcher [mailto:erik.hatc...@gmail.com] > Sent: Friday, January 20, 2012 4:23 PM > To: solr-user@lucene.apache.org > Subject: Re: Just ca

Re: Just can't get Solritas to work, help!

2012-01-20 Thread Erik Hatcher
Steve - sorry... yeah, that one. I missed your backport as of yesterday. I'll give it a whirl, but I'm confident all is well. Thanks! Erik On Jan 20, 2012, at 16:34 , Steven A Rowe wrote: > Erik, > > I've already backported SOLR-2718 - is that what you were referring to when > you

RE: Question about sorting by a field

2012-01-20 Thread federico.wachs
Yes, that works! I had to boost the firstDestination field to have it well sorted. Any ideas why the score might be equally for all the documents returned? Thanks a lot! Federico -- View this message in context: http://lucene.472066.n3.nabble.com/Question-about-sorting-by-a-field-tp3673491p36768

Parameter for database host in DIH?

2012-01-20 Thread Walter Underwood
Is there a way to parameterize the JDBC URL in the data import handler? I tried this, but it did not insert the value of the property. I'm running Solr 3.3.0.

Validating solr user query

2012-01-20 Thread Dipti Srivastava
Hi All, I ma using HTTP/JSON to search my documents in Solr. Now the client provides the query on which the search is based. What is a good way to validate the query string provided by the user. On the other hand, if I want the user to build this query using some Solr api instead of preparing a

Re: Parameter for database host in DIH?

2012-01-20 Thread Shawn Heisey
On 1/20/2012 3:48 PM, Walter Underwood wrote: Is there a way to parameterize the JDBC URL in the data import handler? I tried this, but it did not insert the value of the property. I'm running Solr 3.3.0. Here's what I've got in mine. I pass in dbHost and dbSchema parameters (along wit

Re: Parameter for database host in DIH?

2012-01-20 Thread Walter Underwood
On Jan 20, 2012, at 3:34 PM, Shawn Heisey wrote: > On 1/20/2012 3:48 PM, Walter Underwood wrote: >> Is there a way to parameterize the JDBC URL in the data import handler? I >> tried this, but it did not insert the value of the property. I'm running >> Solr 3.3.0. >> >> >url="jdb

Re: Parameter for database host in DIH?

2012-01-20 Thread Walter Underwood
Weird. I can make it work with a request parameter and $dataimporter.request.dbhost: http://localhost:8983/solr/textbooks/dataimport?command=full-import&dbhost=mydbhost Or I can make it work with a Java system property with no dots. But when I use a Java system property with internal dots, it d

frange with multi-valued fields

2012-01-20 Thread Russell Black
Has anyone had experience using frange with multi-valued fields? In solr 3.5 doing so results in the error: "can not use FieldCache on multivalued field" Here's the use case. We have multiple years attached to each document and want to be able to refine by a year range. We're currently usin

Re: Tika0.10 language identifier in Solr3.5.0

2012-01-20 Thread Jan Høydahl
Another benefit with separate field per lang is that TF/IDF stats gets correct for each individual language. Also if you KNOW the query language, you can target THAT field alone, but if you don't know, you can throw the query at multiple fields, which will each get proper analysis (at the risk o

Re: Tika0.10 language identifier in Solr3.5.0

2012-01-20 Thread Ted Dunning
The TF-IDF argument is a reasonable one. On Fri, Jan 20, 2012 at 5:33 PM, Jan Høydahl wrote: > Another benefit with separate field per lang is that TF/IDF stats gets > correct for each individual language. > Also if you KNOW the query language, you can target THAT field alone, but > if you don't

RE: Question on Reverse Indexing

2012-01-20 Thread Shyam Bhaskaran
Dimitry, I did not find the field "boolean allowLeadingWildcard" in the org.apache.lucene.queryParser.QueryParser class file or anywhere else in the source code But setAllowLeadingWildcard() is being set to true in the org.apache.solr.search.SolrQueryParser class file as shown below public

Re: Parameter for database host in DIH?

2012-01-20 Thread Shawn Heisey
On 1/20/2012 5:01 PM, Walter Underwood wrote: Weird. I can make it work with a request parameter and $dataimporter.request.dbhost: http://localhost:8983/solr/textbooks/dataimport?command=full-import&dbhost=mydbhost Or I can make it work with a Java system property with no dots. But when I use