Re: getting error when ":" in the query
> However I would also like to know that, is there any short > way to put "\" > before special character which will not effect the > performance. There is a static method in org.apache.lucene.queryParser.QueryParser that does this: QueryParser.escape(String s);
Re: multi term, multi field, auto suggest
On 29.01.2010, at 15:40, Lukas Kahwe Smith wrote: > I am still a bit unsure how to handle both the lowercased and the case > preserved version: > > So here are some examples: > UBS => ubs|UBS > Kreuzstrasse => kreuzstrasse|Kreuzstrasse > > So when I type "Kreu" I would get a suggestion of "Kreuzstrasse" and with > "kreu" I would get "kreuzstrasse". > Since I do not expect any words to start with a lowercase letter and still > contain some upper case letter we should be fine with this approach. > > As in I doubt there would be stuff like "fooBar" which would lead to > suggestion both "foobar" and "fooBar". > > How can I achieve this? I just noticed that I need the same thing for the word delimiter splitter. As in some way to index both the splitted and the unsplitted version so that I can use it in a facet search. Hans-Peter => Hans|Peter|Hans-Peter regards, Lukas Kahwe Smith m...@pooteeweet.org
Re: multi term, multi field, auto suggest
On 01.02.2010, at 13:27, Lukas Kahwe Smith wrote: > > On 29.01.2010, at 15:40, Lukas Kahwe Smith wrote: > >> I am still a bit unsure how to handle both the lowercased and the case >> preserved version: >> >> So here are some examples: >> UBS => ubs|UBS >> Kreuzstrasse => kreuzstrasse|Kreuzstrasse >> >> So when I type "Kreu" I would get a suggestion of "Kreuzstrasse" and with >> "kreu" I would get "kreuzstrasse". >> Since I do not expect any words to start with a lowercase letter and still >> contain some upper case letter we should be fine with this approach. >> >> As in I doubt there would be stuff like "fooBar" which would lead to >> suggestion both "foobar" and "fooBar". >> >> How can I achieve this? > > > I just noticed that I need the same thing for the word delimiter splitter. As > in some way to index both the splitted and the unsplitted version so that I > can use it in a facet search. > > Hans-Peter => Hans|Peter|Hans-Peter Sorry for the monolog. I did see http://www.mail-archive.com/solr-user@lucene.apache.org/msg29786.html, which suggests a solution just for lowercase indexing with mixed case suggest via concatenating the lowercased version with some separator with the original version. I guess what I could just do is feed in the same data multiple times and do the approach of [indexterm]|[original] in user land somehow like "Hans-Peter" would be turned into 3 documents: hans|Hans-Peter peter|Hans-Peter hans-peter|Hans-Peter This solution would be quite cool indeed, since I could suggest "Hans-Peter" if someone searches for "Peter". Since I will just use this for a prefix search, I could just set the query analyzer to lowercase the search and it should find the results and I can then add some magic to the frontend display logic to split off the suggested original term. I am not aware of any magic inside the schema.xml that could do this work for me though. I am using the DatabaseHandler to load the documents. I guess I could simply run the query multiple times, but that would screw up the indexing of the non auto suggest index. Then again maybe I want to totally separate the two anyways. regards, Lukas Kahwe Smith m...@pooteeweet.org
Problem in indexing on large data set by Dataimporthandler in solr
> Hi, > > I am trying to index some large set of data in solr using > dataimporthandler. > > It is working fine for small set but when I am trying to index on large > set it produces error. > > I am using solr version 1.3 and mysql version Ver 14.7 Distrib 4.1.20, > for > redhat-linux-gnu (i686) > > > Earlier I was facing java heap space error > > Exception in thread "Thread-16" java.lang.OutOfMemoryError: Java heap > space > > Get it solve by allocating java -Xmx512M -Xms512M -jar start.jar > > But Now it is giving mysql communication link failure error > > com.mysql.jdbc.CommunicationsException: Communications link failure due to > underlying exception: > > ** BEGIN NESTED EXCEPTION ** > > java.io.EOFException > > STACKTRACE: > > java.io.EOFException > at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1913) > I had added transactionIsolation="TRANSACTION_READ_COMMITTED" > holdability="CLOSE_CURSORS_AT_COMMIT" in this schema as suggested by solr > wiki page > but it is still not working. > > Here is my data-config.xml file > > > > url="jdbc:mysql://localhost/databasename" > user="xxx" password="" batchSize="-1" readOnly="true" > autoCommit="false" transactionIsolation="TRANSACTION_READ_COMMITTED" > holdability="CLOSE_CURSORS_AT_COMMIT"/> > > transformer="TemplateTransformer"> > > > > > > > > > > > > > > My table contain around 1.5 core of data on local machine On production it > is > containing around 4 times of data as on local. > > > > Thank you, > Vijayant Kumar > Software Engineer > Website Toolbox Inc. > http://www.websitetoolbox.com > 1-800-921-7803 x211 > -- Thank you, Vijayant Kumar Software Engineer Website Toolbox Inc. http://www.websitetoolbox.com 1-800-921-7803 x211
Re: Problem in indexing on large data set by Dataimporthandler in solr
Can you give it a shot on Solr 1.4 instead? DIH has had numerous enhancements/fixes since 1.3. Erik On Feb 1, 2010, at 8:42 AM, Vijayant Kumar wrote: Hi, I am trying to index some large set of data in solr using dataimporthandler. It is working fine for small set but when I am trying to index on large set it produces error. I am using solr version 1.3 and mysql version Ver 14.7 Distrib 4.1.20, for redhat-linux-gnu (i686) Earlier I was facing java heap space error Exception in thread "Thread-16" java.lang.OutOfMemoryError: Java heap space Get it solve by allocating java -Xmx512M -Xms512M -jar start.jar But Now it is giving mysql communication link failure error com.mysql.jdbc.CommunicationsException: Communications link failure due to underlying exception: ** BEGIN NESTED EXCEPTION ** java.io.EOFException STACKTRACE: java.io.EOFException at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1913) I had added transactionIsolation="TRANSACTION_READ_COMMITTED" holdability="CLOSE_CURSORS_AT_COMMIT" in this schema as suggested by solr wiki page but it is still not working. Here is my data-config.xml file My table contain around 1.5 core of data on local machine On production it is containing around 4 times of data as on local. Thank you, Vijayant Kumar Software Engineer Website Toolbox Inc. http://www.websitetoolbox.com 1-800-921-7803 x211 -- Thank you, Vijayant Kumar Software Engineer Website Toolbox Inc. http://www.websitetoolbox.com 1-800-921-7803 x211
Re: Wildcard Search and Filter in Solr
hey thanks ravi , ahmed and Erik for your reply. though its tough to change my solr version , still let me try out at 1.4 and see. Erik Hatcher-4 wrote: > > Note that the query analyzer output is NOT doing query _parsing_, but > rather taking the string you passed and running it through the query > analyzer only. When using the default query parser, Inte* will be a > search for terms that begin with "inte". It is odd that you're not > finding it. But you're using a pretty old version of Solr and quite > likely something here has been fixed since. > > Give Solr 1.4 a try. > > Erik > > > On Jan 27, 2010, at 12:56 AM, ashokcz wrote: > >> >> Hi just looked at the analysis.jsp and found out what it does during >> index / >> query >> >> Index Analyzer >> Intel >> intel >> intel >> intel >> intel >> intel >> >> Query Analyzer >> Inte* >> Inte* >> inte* >> inte >> inte >> inte >> int >> >> I think somewhere my configuration or my definition of the type >> "text" is >> wrong. >> This is my configuration . >> >> >> >> >> >> > class="solr.WordDelimiterFilterFactory" generateNumberParts="1" >> generateWordParts="1"/> >> >> >> >> >> >> >> >> >> >>> ignoreCase="true" >> synonyms="synonyms.txt"/> >> >> > class="solr.WordDelimiterFilterFactory" generateNumberParts="1" >> generateWordParts="1"/> >> >> >> >> >> >> >> >> I think i am missing some basic configuration for doing wildcard >> searches . >> but could not figure it out . >> can someone help please >> >> >> Ahmet Arslan wrote: >>> >>> Hi , I m trying to use wildcard keywords in my search term and filter term . but i didnt get any results. Searched a lot but could not find any lead . Can someone help me in this. i m using solr 1.2.0 and have few records indexed with vendorName value as Intel In solr admin interface i m trying to do the search like this http://localhost:8983/solr/select?indent=on&version=2.2&q=intel&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl.fl= and i m getting the result properly but when i use q=inte* no records are returned. the same is the case for Filter Query on using &fq=VendorName:"Intel" i get my results. but on using &fq=VendorName:"Inte*" no results are returned. I can guess i doing mistake in few obvious things , but could not figure it out .. Can someone pls help me out :) :) >>> >>> If &q=intel returns documents while q=inte* does not, it means that >>> fieldType of your defaultSearchField is reducing the token intel into >>> something. >>> >>> Can you find out it by using /admin/anaysis.jsp what happens to >>> "Intel >>> intel" at index and query time? >>> >>> What is your defaultSearchField? Is it VendorName? >>> >>> It is expected that &fq=VendorName:Intel returns results while >>> &fq=VendorName:Inte* does not. Because prefix queries are not >>> analyzed. >>> >>> >>> But it is strange that q=inte* does not return anything. Maybe your >>> index >>> analyzer is reducing Intel into int or ıntel? >>> >>> I am not 100% sure but solr 1.2.0 may use default locale in >>> lowercase >>> operation. What is your default locale? >>> >>> It is better to see what happens word Intel using analysis.jsp page. >>> >>> >>> >>> >>> >> >> -- >> View this message in context: >> http://old.nabble.com/Wildcard-Search-and-Filter-in-Solr-tp27306734p27334486.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > > -- View this message in context: http://old.nabble.com/Wildcard-Search-and-Filter-in-Solr-tp27306734p27405151.html Sent from the Solr - User mailing list archive at Nabble.com.
What is the version=2.2 refers in solr admin url
Hi All , I m using solr admin to try out some search results. was trying with solr 1.2 and then checking with solr 1.4 , just got a doubt on looking at the url parameters. i could see version=2.2 in the url . what that version refers to ?? just curious to know about it :) :) -- View this message in context: http://old.nabble.com/What-is-the-version%3D2.2-refers-in-solr-admin-url-tp27405274p27405274.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: solr - katta integration
Hi Jason, I looked at your way of integrating katta into solr in issue 1395 and I was trying to understand the architecture of the whole set up . I understand that Solr+Katta nodes talk to each other via Hadoop RPC (provided by Katta). Is the real search being taken care by SOLR and the SOLRs communicate via Hadoop RPC? which node takes care of the original client SOLR http request(the one with the shards parameter)? This node probably communicate to other SOLR +Katta node via Hadoop RPC ? There is only Http request to the SOLR+Katta node and it gets propagated to other nodes in the cluster via RPC and SOLR again takes care of search? It would be great if you could shed some light on this . thanks Sudershan Reddy -Original Message- From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] Sent: Friday, January 29, 2010 2:48 AM To: solr-user@lucene.apache.org; vsre...@huawei.com Subject: Re: solr - katta integration Hi Reddy, What's the limitation you're running into? Jason On Thu, Jan 28, 2010 at 2:15 AM, V SudershanReddy wrote: > Hi, > > > > Can we Integrate solr with katta? > > In order to overcome the limitations of Solr in distributed search, I need > to integrate katta with solr, without loosing any features of Solr. > > > > Any Suggestions? > > > > Any help appreciated. > > > > Thanks, > > Sudharshan > > > > > > > > > >
weird behabiour when setting negative boost with bq using dismax
I already asked about this long ago but the answer doesn't seem to work... I am trying to set a negative query boost to send the results that match field_a: 54 to a lower position. I have tried it in 2 different ways: bq=(*:* -field_a:54^1) bq=-field_a:54^1 None of them seem to work. What seems to happen is that results that match field_a:54 are excluded. Just like doing: fq=-field_a:54 Any idea what could be happening? Has anyone experienced this behaviour before? Thnaks in advance -- View this message in context: http://old.nabble.com/weird-behabiour-when-setting-negative-boost-with-bq-using-dismax-tp27406614p27406614.html Sent from the Solr - User mailing list archive at Nabble.com.
Fwd: machine tags, copy fields and pattern tokenizers
Hi, Just a quick note to mention that I finally figured (most of) this out. The short version is that if there's an explicit "index" analyzer (as in type="index") but not a corresponding "query" analyzer then Solr appears to use the first for all cases. I guess this makes sense but it's a bit confusing so if I get a few minutes I will update the wiki to make the distinction explicit. The longer version is over here, for anyone interested: http://github.com/straup/solr-machinetags The long version is me asking a couple more questions: # All the questions assume the following schema.xml: # http://github.com/straup/solr-machinetags/blob/master/conf/schema.xml Because all the values for a given namespace/predicate field get indexed in the same multiValue bucket, the faceting doesn't behave the way you'd necessarily expect. For example, if you index the following... solr.add([{ 'id' : int(time.time()), 'body' : 'float thing', 'tag' : 'w,t,f', 'machinetag' : 'dc:number=12345' }]) solr.add([{ 'id' : int(time.time()), 'body' : 'decimal thing', 'tag' : 'a,b,c', 'machinetag' : 'dc:number=123.23' }]) solr.add([{ 'id' : int(time.time()), 'body' : 'negative thing', 'tag' : 'a,b,c', 'machinetag' : ['dc:number=-45.23', 'asc:test=rara'] }]) ...and then facet on the predicates for ?q=ns:dc (basically to ask: show me all the predicates for the "dc:" namespace) you end up with... "facet_fields":{ "ns":[ "asc",1, "dc",1]}, ...which seems right from a Solr perspective but isn't really a correct representation of the machine tags. Can anyone offer any ideas on a better/different way to model this data? Also, has anyone figured out how to match on double quotes inside a regular expression defined in an XML attribute? As in: pattern="^(?:(?:[a-zA-Z]|\d)(?:\w+)?)\:(?:(?:[a-zA-Z]|\d)(?:\w+)?)=(.+)" group="1" /> Where that pattern should really end: =\"?(.+)\"?$ Thanks, Original Message Subject: machine tags, copy fields and pattern tokenizers Date: Mon, 25 Jan 2010 16:20:58 -0800 From: straup Reply-To: str...@gmail.com To: solr-user@lucene.apache.org Hi, I am trying to work out how to store, query and facet machine tags [1] in Solr using a combination of copy fields and pattern tokenizer factories. I am still relatively new to Solr so despite feeling like I've gone over the docs, and friends, it's entirely possible I've missed something glaringly obvious. The short version is: Faceting works. Yay! You can facet on the individual parts of a machine tag (namespace, predicate, value) and it does what you'd expect. For example: ?q=*:*&facet=true&facet.field=mt_namespace&rows=0 numFound:115 foo:65 dc:48 lastfm:2 The longer version is: Even though faceting seems to work I can't query (as in ?q=) on the individual fields. For example, if a single "machinetag" (foo:bar=example) field is copied to "mt_namespace", "mt_predicate" and "mt_value" fields I still can't query for "?q=mt_namespace:foo". It appears as though the entire machine tag is being copied to mt_namespace even though my reading of the docs is that is a attribute is present in a solr.PatternTokenizerFactory analyzer then only the matching capture group will be stored. Is that incorrect? I've included the field/fieldType definitions I'm using below. [2] Any help/suggestions would be appreciated. Cheers, [1] http://www.flickr.com/groups/api/discuss/72157594497877875/ [2]
Re: Contributors - Solr in Action Case Studies
Hello everyone, Thanks to all who emailed me so far. This is just another reminder for those who missed the first email below. Please let us know if you'd like to contribute a piece to Solr in Action about your interesting use of Solr. Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message > From: Otis Gospodnetic > To: solr-user@lucene.apache.org > Sent: Thu, January 14, 2010 2:09:41 PM > Subject: Contributors - Solr in Action Case Studies > > Hello, > > We are working on Solr in Action [1]. One of the well received chapters from > LIA #1[2] was the Case Studies chapter, where external contributors described > how they used Lucene. We are getting good feedback about this chapter from > LIA > #2 reviewers, too. > > Solr in Action also has a Case Studies chapter, and we are starting to look > for > contributors. > > If you are using Solr in some clever, interesting, or unusual way and are > willing to share this information, please get in touch. 5 to max 10 pages > (soft > limits) per study is what we are hoping for. Feel free to respond on the > list > or reply to me directly. > > [1] http://www.manning.com/catalog/undercontract.html > [2] http://www.manning.com/hatcher2/ and http://www.manning.com/hatcher3/ > > Thanks, > Otis > -- > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
DataImportHandler delta-import confusion
First, let me just say that DataImportHandler is fantastic. It got my old mysql-php-xml index rebuild process down from 30 hours to 6 minutes. I'm trying to use the delta-import functionality now but failing miserably. Here's my entity tag: (some SELECT statements reduced to increase readability) deltaQuery="select moment_id from moments where date_modified > '${dataimporter.last_index_time}'" deltaImportQuery="select [bunch of stuff] WHERE m.moment_id = '${dataimporter.delta.MOMENTID}'" pk="MOMENTID" transformer="TemplateTransformer"> When I look at the MySQL query log I see the date modified query running fine and returning 3 rows. The deltaImportQuery, however, does not have the proper primary key in the where clause. It's just blank. I also tried changing it to ${moment.MOMENTID}. I don't really get the relation between the pk field and the ${dataimport.delta.whatever} stuff. Help please! -jsd-
query on not stored field
Hi on the following field [...] [...] the following query works {!lucene q.op=AND} [...] AND (status.message&STRING_ANALYZED_NO_US:(some keywords) AND [...] I was wondering If the query syntax above works as well if the store property of the field is set to NO. [...] [...] I have tried it and it seems to work. I would appreciate if someone could confirm! Thank you
Re: query on not stored field
Both index="analyzed" and store="yes" are not parsed by Solr schema. Use indexed and stored instead of index and store, and set either "true" or "false". Koji -- http://www.rondhuit.com/en/
Re: query on not stored field
First of all, the schema snippets you provided aren't right. It's indexed="true", not index="analyzed". And it's stored, not store. But, to answer your question, the stored nature of the field has nothing whatsoever to do with it's searchability. Stored only affects whether you can get that value back in the documents returned from a search, or not. Erik On Feb 1, 2010, at 7:12 PM, Matthieu Labour wrote: Hi on the following field [...] [...] the following query works {!lucene q.op=AND} [...] AND (status.message&STRING_ANALYZED_NO_US: (some keywords) AND [...] I was wondering If the query syntax above works as well if the store property of the field is set to NO. [...] [...] I have tried it and it seems to work. I would appreciate if someone could confirm! Thank you
Indexing a oracle warehouse table
Hello all, hope someone can point me to right direction. I am trying to index an oracle warehouse table(TableA) with 850 columns. Out of the structure about 800 fields are CLOBs and are good candidate to enable full-text searching. Also have few columns which has relational link to other tables. I am clean on how to create a root entity and then pull data from other relational link as child entities. Most columns in TableA are named as field1,field2...field800. Now my question is how to organize the schema efficiently: First option: if my query is 'select * from TableA', Do I define for each of those 800 columns? Seems cumbersome. May be can write a script to generate XML instead of handwriting both in data-config.xml and schema.xml. OR Dont define any so that column in SOLR will be same as in the database table. But questions are 1)How do I define unique field in this scenario? 2) How to copy all the text fields to a common field for easy searching? Any helpful is appreciated. Please feel free to suggest any alternative way. Thanks -- View this message in context: http://old.nabble.com/Indexing-a-oracle-warehouse-table-tp27414263p27414263.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: query on not stored field
Koji, Eric Thank you for your reply One more question: What about a field that is both indexed="false" stored="false" ... does it have an impact into solr meaning is it being ignored by solr/lucene? is it like the field was not being passed? Thank you! --- On Mon, 2/1/10, Erik Hatcher wrote: From: Erik Hatcher Subject: Re: query on not stored field To: solr-user@lucene.apache.org Date: Monday, February 1, 2010, 6:32 PM First of all, the schema snippets you provided aren't right. It's indexed="true", not index="analyzed". And it's stored, not store. But, to answer your question, the stored nature of the field has nothing whatsoever to do with it's searchability. Stored only affects whether you can get that value back in the documents returned from a search, or not. Erik On Feb 1, 2010, at 7:12 PM, Matthieu Labour wrote: > Hi > > on the following field > > > [...] > > [...] > > > the following query works > > {!lucene q.op=AND} [...] AND (status.message&STRING_ANALYZED_NO_US:(some > keywords) AND [...] > > I was wondering If the query syntax above works as well if the store property > of the field is set to NO. > > > > [...] > > > > [...] > > > > > I have tried it and it seems to work. I would appreciate if someone could > confirm! > > Thank you > > > > >
Re: query on not stored field
On Feb 1, 2010, at 8:45 PM, Matthieu Labour wrote: What about a field that is both indexed="false" stored="false" ... does it have an impact into solr meaning is it being ignored by solr/ lucene? is it like the field was not being passed? Yes, that's a trick in Solr to ignore a field. The example schema actually even includes a fieldtype called "ignored" with these settings too. Erik
Solr and location based searches
Hi all, We want to use Solr because of its facet based functionalities... now the customer want to combine searches based on facets with location based searches (all objects 10 miles around this particular zip)... Is this possible in Solr or is there no way? Thanks and best regards, Sandro