date:20101111

data import scheduling

2010-11-11 Thread Tri Nguyen

Hi, Has anyone gotten solr to schedule data imports at a certain time interval through configuring solr? I tried setting interval=1, which is import every minute but I don't see it happening. I'm trying to avoid cron jobs. Thanks, Tri

Re: solr dynamic core creation

2010-11-11 Thread nizan

Does anyone has any idea on how to do this? -- View this message in context: http://lucene.472066.n3.nabble.com/solr-dynamic-core-creation-tp1867705p1881374.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: To cache or to not cache

2010-11-11 Thread Em

Jonathan, thanks for your statement. In fact, you are quite right: A lot of people developed great caching mechanisms. However, the solution I got in mind was something like an HTTP-Cache - in most cases on the same box. I talked to some experts who told me that Squid would be a relatively large

Re: How to use polish stemmer - Stempel - in schema.xml?

2010-11-11 Thread Jakub Godawa

Hi! Sorry for such a break, but I was moving house... anyway: 1. I took the ~/apache-solr/src/java/org/apache/solr/analysis/StandardFilterFactory.java file and modified it (named as StempelFilterFactory.java) in Vim that way: package org.getopt.solr.analysis; import org.apache.lucene.analysis.T

Error while indexing files with Solr

2010-11-11 Thread Kaustuv Royburman

Hi, I am trying to index documents (PDF, Doc, XLS, RTF) using the ExtractingRequestHandler. I am following the tutorial at http://wiki.apache.org/solr/ExtractingRequestHandler But when i run the following command *curl "http://localhost:8983/solr/update/extract?literal.id=mydoc.doc&uprefix=

index just new articles from rss feeds - Data Import Request Handler

2010-11-11 Thread Matteo Moci

Hello, I'd like to use solr to index some documents coming from an rss feed, like the example at [1], but it seems that the configuration used there is just for a one-time indexing, trying to get all the articles exposed in the rss feed of the website. Is it possible to manage and index just the n

IndexTank technology...

2010-11-11 Thread Glen Newton

Does anyone know what technology they are using: http://www.indextank.com/ Is it Lucene under the hood? Thanks, and apologies for cross-posting. -Glen http://zzzoot.blogspot.com -- -

solr 1.3 how to parse "rich" documents

2010-11-11 Thread Nikola Garafolic

Hi, I use solr 1.3 with patch for parsing rich documents, and when uploading for example pdf file, only thing I see in solr.log is following: INFO: [] webapp=/solr path=/update/rich params={id=250&stream.type=pdf&fieldnames=id,name&commit=true&stream.fieldname=body&name=iphone+user+guide+pdf+

Re: Adding new field after data is already indexed

2010-11-11 Thread Erick Erickson

@Jerry Li What version of Solr were you using? And was there any data in the new field? I have no problems here with a quick test I ran on trunk... Best Erick On Thu, Nov 11, 2010 at 1:37 AM, Jerry Li | 李宗杰 wrote: > but if I use this field to do sorting, there will be an error occured > and th

Re: solr dynamic core creation

2010-11-11 Thread Robert Sandiford

Hi, nizan. I didn't realize that just replying to a thread from my email client wouldn't get back to you. Here's some info on this thread since your original post: On Nov 10, 2010, at 12:30pm, Bob Sandiford wrote: > Why not use replication? Call it inexperience... > > We're really early into

Issue with facet fields

2010-11-11 Thread gauravshetti

I am facing this weird issue in facet fields Within config xml under − I have defined the fl as file_id folder_id display_name file_name priority_text content_type last_upload upload_by business indexed But my out xml doesnt contain the element upload_by and business But i

Re: WELCOME to solr-user@lucene.apache.org

2010-11-11 Thread Solr User

Hi, I have a question about boosting. I have the following fields in my schema.xml: 1. title 2. description 3. ISBN etc I want to boost the field title. I tried index time boosting but it did not work. I also tried Query time boosting but with no luck. Can someone help me on how to implement

Boosting

2010-11-11 Thread Solr User

Hi, I have a question about boosting. I have the following fields in my schema.xml: 1. title 2. description 3. ISBN etc I want to boost the field title. I tried index time boosting but it did not work. I also tried Query time boosting but with no luck. Can someone help me on how to implement

Re: solr dynamic core creation

2010-11-11 Thread nizan

Hi, Thanks for the offers, I'll take deeper look into them. In the offers you showed me, if I understand correctly, the call for creation is done in the client side. I need the mechanism we'll work in the server side. I know it sounds stupid, but I need the client side wouldn't know about which

problem with wildcard

2010-11-11 Thread Jean-Sebastien Vachon

Hi All, I'm having some trouble with a query using some wildcard and I was wondering if anyone could tell me why these two similar queries do not return the same number of results. Basically, the query I'm making should return all docs whose title starts (or contain) the string "lowe'". I suspe

Re: solr dynamic core creation

2010-11-11 Thread Robert Sandiford

Hmmm. Maybe you need to define what you mean by 'server' and what you mean by 'client'. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-dynamic-core-creation-tp1867705p1883238.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr dynamic core creation

2010-11-11 Thread nizan

Hi, Maybe just don't understand all the concept there and I mix up server and client... Client - The place where I make the http calls (for index, search etc.) - where I use the CommonsHttpSolrServer as the solr server. This machine isn't defined as master or slave, it just use solr as search en

Re: Crawling with nutch and mapping fields to solr

2010-11-11 Thread Jean-Luc

I'm going down the route of patching nutch so I can use this ParseMetaTags plugin: https://issues.apache.org/jira/browse/NUTCH-809 Also wondering whether I will be able to use the XMLParser to allow me to parse well formed XHTML, using xpath would be bonus: https://issues.apache.org/jira/browse/N

EdgeNGram relevancy

2010-11-11 Thread Robert Gründler

Hi, consider the following fieldtype (used for autocompletion): This works fine as long as the query string is a single word. For multiple words, the ranking is weird though. Example: Que

Re: solr dynamic core creation

2010-11-11 Thread Robert Sandiford

No - in reading what you just wrote, and what you originally wrote, I think the misunderstanding was mine, based on the architecture of my code. In my code, it is our 'server' level that does the SolrJ indexing calls, but you meant 'server' to be the Solr instance, and what you mean by 'client' i

Re: Any Copy Field Caveats?

2010-11-11 Thread Tod

I've noticed that using camelCase in field names causes problems. On 11/5/2010 11:02 AM, Will Milspec wrote: Hi all, we're moving from an old lucene version to solr and plan to use the "Copy Field" functionality. Previously we had "rolled our own" implementation, sticking title, description,

Re: Concatenate multiple tokens into one

2010-11-11 Thread Nick Martin

Hi Robert, All, I have a similar problem, here is my fieldType, http://paste.pocoo.org/show/289910/ I want to include stopword removal and lowercase the incoming terms. The idea being to take, "Foo Bar Baz Ltd" and turn it into "foobarbaz" for the EdgeNgram filter factory. If anyone can tell me

Rollback can't be done after committing?

2010-11-11 Thread Kouta Osabe

Hi, all I have a question about Solr and SolrJ's rollback. I try to rollback like below try{ server.addBean(dto); server.commit; }catch(Exception e){ if (server != null) { server.rollback();} } I wonder if any Exception thrown, "rollback" process is run. so all data would not be updated. but

Re: Rollback can't be done after committing?

2010-11-11 Thread Jonathan Rochkind

What you say is true. Solr is not an rdbms. Kouta Osabe wrote: Hi, all I have a question about Solr and SolrJ's rollback. I try to rollback like below try{ server.addBean(dto); server.commit; }catch(Exception e){ if (server != null) { server.rollback();} } I wonder if any Exception thrown,

using CJKTokenizerFactory for Japanese language

2010-11-11 Thread Kumar Pandey

I am exploring support for Japanese language in solr. Solr seems to provide CJKTokenizerFactory. How useful is this module? Has anyone been using this in production for Japanese language? One shortfall it seems to have from what I have been able to read up on is that it can generate lot of false m

Re: EdgeNGram relevancy

2010-11-11 Thread Ahmet Arslan

You can add an additional field, with using KeywordTokenizerFactory instead of WhitespaceTokenizerFactory. And query both these fields with an OR operator. edgytext:(Bill Cl) OR edgytext2:"Bill Cl" You can even apply boost so that begins with matches comes first. --- On Thu, 11/11/10, Robert G

Re: Issue with facet fields

2010-11-11 Thread Paige Cook

Are you storing the upload_by and business fields? You will not be able to retrieve a field from your index if it is not stored. Check that you have stored="true" for both of those fields. - Paige On Thu, Nov 11, 2010 at 10:23 AM, gauravshetti wrote: > > I am facing this weird issue in facet fie

Re: EdgeNGram relevancy

2010-11-11 Thread Robert Gründler

thanks a lot, that setup works pretty well now. the only problem now is that the StopWords do not work that good anymore. I'll provide an example, but first the 2 fieldtypes:

Re: Concatenate multiple tokens into one

2010-11-11 Thread Robert Gründler

I've posted a ConcaFilter in my previous mail which does concatenate tokens. This works fine, but i realized that what i wanted to achieve is implemented easier in another way (by using 2 separate field types). Have a look at a previous mail i wrote to the list and the reply from Ahmet Arslan (

Memory used by facet queries

2010-11-11 Thread Charlie Gildawie

Hello All. My first time post so be kind. Developing a document store with lots and lots of very small documents. (200 million at the moment. Final size will probably be double this at 400 million documents). This is Proof of concept development so we are seeing what a single code can do for us

Re: EdgeNGram relevancy

2010-11-11 Thread Ahmet Arslan

> This setup now makes troubles regarding StopWords, here's > an example: > > Let's say the index contains 2 Strings: "Mr Martin > Scorsese" and "Martin Scorsese". "Mr" is in the stopword > list. > > Query: edgytext:Mr Scorsese OR edgytext2:Mr Scorsese^2.0 > > This way, the only result i get is

Search Result Differences a Puzzle

2010-11-11 Thread Eric Martin

Hi, I cannot find out how this is occurring: Nolosearch/com/search/apachesolr_search/law You can see that the John Paul Stevens result yields more description in the search result because of the keyword relevancy, whereas, the other results just give you a snippet of the title ba

Retrieving indexed content containing multiple languages

2010-11-11 Thread Tod

My Solr corpus is currently created by indexing metadata from a relational database as well as content pointed to by URLs from the database. I'm using a pretty generic out of the box Solr schema. The search results are presented via an AJAX enabled HTML page. When I perform a search the docu

Re: Concatenate multiple tokens into one

2010-11-11 Thread Nick Martin

Thanks Robert, I had been trying to get your ConcatFilter to work, but I'm not sure what i need in the classpath and where Token comes from. Will check the thread you mention. Best Nick On 11 Nov 2010, at 18:13, Robert Gründler wrote: > I've posted a ConcaFilter in my previous mail which does

Re: Concatenate multiple tokens into one

2010-11-11 Thread Robert Gründler

this is the full source code, but be warned, i'm not a java developer, and i have no background in lucine/solr development: // ConcatFilter import java.io.IOException; import org.apache.lucene.analysis.Token; import org.apache.lucene.analysis.TokenFilter; import org.apache.lucene.analysis.TokenS

Re: EdgeNGram relevancy

2010-11-11 Thread Nick Martin

On 12 Nov 2010, at 01:46, Ahmet Arslan wrote: >> This setup now makes troubles regarding StopWords, here's >> an example: >> >> Let's say the index contains 2 Strings: "Mr Martin >> Scorsese" and "Martin Scorsese". "Mr" is in the stopword >> list. >> >> Query: edgytext:Mr Scorsese OR edgytext2

Re: Retrieving indexed content containing multiple languages

2010-11-11 Thread Dennis Gearon

I look forward to the eanswers to this one. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com

Re: EdgeNGram relevancy

2010-11-11 Thread Andy

Could anyone help me understand what does "Clyde Phillips" appear in the results for "Bill Cl"?? "Clyde Phillips" doesn't produce any EdgeNGram that would match "Bill Cl", so why is it even in the results? Thanks. --- On Thu, 11/11/10, Ahmet Arslan wrote: > You can add an additional field, w

Re: problem with wildcard

2010-11-11 Thread Ahmet Arslan

> I'm having some trouble with a query using some wildcard > and I was wondering if anyone could tell me why these two > similar queries do not return the same number of results. > Basically, the query I'm making should return all docs whose > title starts > (or contain) the string "lowe'". I suspe

Re: EdgeNGram relevancy

2010-11-11 Thread Robert Gründler

according to the fieldtype i posted previously, i think it's because of: 1. WhiteSpaceTokenizer splits the String "Clyde Phillips" into 2 tokens: "Clyde" and "Phillips" 2. EdgeNGramFilter gets the 2 tokens, and creates an EdgeNGram for each token: "C" "Cl" "Cly" ... AND "P" "Ph" "Phi" ... Th

FAST ESP -> Solr migration webinar

2010-11-11 Thread Yonik Seeley

We're holding a free webinar on migration from FAST to Solr. Details below. -Yonik http://www.lucidimagination.com = Solr To The Rescue: Successful Migration From FAST ESP to Open Source Search Based on Apache Solr Thur

Re: problem with wildcard

2010-11-11 Thread Jean-Sebastien Vachon

On 2010-11-11, at 3:45 PM, Ahmet Arslan wrote: >> I'm having some trouble with a query using some wildcard >> and I was wondering if anyone could tell me why these two >> similar queries do not return the same number of results. >> Basically, the query I'm making should return all docs whose >> t

facet+shingle in autosuggest

2010-11-11 Thread Lukas Kahwe Smith

Hi, I am using a facet.prefix search with shingle's in my autosuggest: Now I would like to prevent stop words to appear in the suggestions: 52 6 6 5 25 7 Here I would like to filter out the last 4 suggestions really. Is there a way I

Re: problem with wildcard

2010-11-11 Thread Ahmet Arslan

> select?q=*:*&fq=title:(+lowe')&debugQuery=on&rows=0 > > > > "wildcard queries are not analyzed" http://search-lucene.com/m/pnmlH14o6eM1/ > > > > Yeah I found out about this a couple of minutes after I > posted my problem. If there is no analyzer then > why is Solr not finding any documents whe

Re: EdgeNGram relevancy

2010-11-11 Thread Andy

Ah I see. Thanks for the explanation. Could you set the defaultOperator to "AND"? That way both "Bill" and "Cl" must be a match and that would exclude "Clyde Phillips". --- On Thu, 11/11/10, Robert Gründler wrote: > From: Robert Gründler > Subject: Re: EdgeNGram relevancy > To: solr-user@luc

Re: WELCOME to solr-user@lucene.apache.org

2010-11-11 Thread Erick Erickson

There's not much to go on here. Boosting works, and index time as opposed to query time boosting addresses two different needs. Could you add some detail? All you've really said is "it didn't work", which doesn't allow a very constructive response. Perhaps you could review: http://wiki.apache.org/

Re: WELCOME to solr-user@lucene.apache.org

2010-11-11 Thread Solr User

Eric, Thank you so much for the reply and apologize for not providing all the details. The following are the field definitons in my schema.xml: Copy Fields: searchFields Before creating the indexes I feed XML

Re: WELCOME to solr-user@lucene.apache.org

2010-11-11 Thread Ahmet Arslan

There are several mistakes in your approach: copyField just copies data. Index time boost is not copied. There is no such boosting syntax. /select?q=Each&title^9&fl=score You are searching on your default field. This is not your cause of your problem but omitNorms="true" disables index time b

Re: facet+shingle in autosuggest

2010-11-11 Thread Erick Erickson

I don't know all the implications here, but can't you just insert the StopwordFilterFactory before the ShingleFilterFactory and turn it loose? Best Erick On Thu, Nov 11, 2010 at 4:02 PM, Lukas Kahwe Smith wrote: > Hi, > > I am using a facet.prefix search with shingle's in my autosuggest: > p

Re: using CJKTokenizerFactory for Japanese language

2010-11-11 Thread Koji Sekiguchi

(10/11/12 1:49), Kumar Pandey wrote: I am exploring support for Japanese language in solr. Solr seems to provide CJKTokenizerFactory. How useful is this module? Has anyone been using this in production for Japanese language? CJKTokenizer is used in a lot of places in Japan. One shortfall it s

Re: facet+shingle in autosuggest

2010-11-11 Thread Lukas Kahwe Smith

On 11.11.2010, at 17:42, Erick Erickson wrote: > I don't know all the implications here, but can't you just > insert the StopwordFilterFactory before the ShingleFilterFactory > and turn it loose? havent tried this, but i would suspect that i would then get in trouble with stuff like "united st

Re: EdgeNGram relevancy

2010-11-11 Thread Robert Gründler

> > Did you run your query without using () and "" operators? If yes can you try > this? > &q=edgytext:(Mr Scorsese) OR edgytext2:"Mr Scorsese"^2.0 I didn't use () and "" in my query before. Using the query with those operators works now, stopwords are thrown out as the should, thanks. However,

Re: EdgeNGram relevancy

2010-11-11 Thread Jonathan Rochkind

Without the parens, the "edgytext:" only applied to "Mr", the default field still applied to "Scorcese". The double quotes are neccesary in the second case (rather than parens), because on a non-tokenized field because the standard query parser will "pre-tokenize" on whitespace before sending

Re: WELCOME to solr-user@lucene.apache.org

2010-11-11 Thread Ramavtar Meena

Hi, If you are looking for query time boosting on title field you can do the following: /select?q=title:android^10 Also unless you have a very good reason to use string for date data (in your case pubdate and reldate), you should be using solr.DateField. regards, Ram On Fri, Nov 12, 2010 at 3:41

Best practices to rebuild index on live system

2010-11-11 Thread Robert Gründler

Hi again, we're coming closer to the rollout of our newly created solr/lucene based search, and i'm wondering how people handle changes to their schema on live systems. In our case, we have 3 cores (ie. A,B,C), where the largest one takes about 1.5 hours for a full dataimport from the relation

Re: Best practices to rebuild index on live system

2010-11-11 Thread Jonathan Rochkind

You can do a similar thing to your case #1 with Solr replication, handling a lot of the details for you instead of you manually switching cores and such. Index to a new core, then tell your production solr to be a slave replicating from that master new core. It still may have some of the same d

Re: Best practices to rebuild index on live system

2010-11-11 Thread Erick Erickson

If by "corrupt index" you mean an index that's just not quite up to date, could you do a delta import? In other words, how do you make our Solr index reflect changes to the DB even without a schema change? Could you extend that method to handle your use case? So the scenario is something like this

Re: Spatial search in Solr 1.5

2010-11-11 Thread Scott K

I just upgraded to a later version of the trunk and noticed my geofilter queries stopped working, apparently because the sfilt function was renamed to geofilt. I realize trunk is not stable, but other than looking at every change, is there an easy way to find changes that are not backward compatib

Re: index just new articles from rss feeds - Data Import Request Handler

2010-11-11 Thread Shalin Shekhar Mangar

On Thu, Nov 11, 2010 at 8:21 AM, Matteo Moci wrote: > Hello, > I'd like to use solr to index some documents coming from an rss feed, > like the example at [1], but it seems that the configuration used > there is just for a one-time indexing, trying to get all the articles > exposed in the rss feed

Re: Boosting

2010-11-11 Thread Shalin Shekhar Mangar

On Thu, Nov 11, 2010 at 10:35 AM, Solr User wrote: > Hi, > > I have a question about boosting. > > I have the following fields in my schema.xml: > > 1. title > 2. description > 3. ISBN > > etc > > I want to boost the field title. I tried index time boosting but it did not > work. I also tried Quer

Looking for help with Solr implementation

2010-11-11 Thread AC

Hi, Not sure if this is the correct place to post but I'm looking for someone to help finish a Solr install on our LAMP based website. This would be a paid project. The programmer that started the project got too busy with his full-time job to finish the project. Solr has been installed

Link to download solr4.0 is not working?

2010-11-11 Thread Deche Pangestu

Hello, Does anyone know where to download solr4.0 source? I tried downloading from this page: http://wiki.apache.org/solr/FrontPage#solr_development but the link is not working... Best, Deche

importing from java

2010-11-11 Thread Tri Nguyen

Hi, I'm restricted to the following in regards to importing. I have access to a list (Iterator) of Java objects I need to import into solr. Can I import the java objects as part of solr's data import interface (whenever an http request to solr to do a dataimport, it'll call my java class to get

Re: Rollback can't be done after committing?

2010-11-11 Thread gengshaoguang

Hi, Kouta: Any data store does not support rollback AFTER commit, rollback works only BEFORE. On Friday, November 12, 2010 12:34:18 am Kouta Osabe wrote: > Hi, all > > I have a question about Solr and SolrJ's rollback. > > I try to rollback like below > > try{ > server.addBean(dto); > server.c

Re: Rollback can't be done after committing?

2010-11-11 Thread Pradeep Singh

In some cases you can rollback to a named checkpoint. I am not too sure but I think I read in the lucene documentation that it supported named checkpointing. On Thu, Nov 11, 2010 at 7:12 PM, gengshaoguang wrote: > Hi, Kouta: > Any data store does not support rollback AFTER commit, rollback works

A Newbie Question

2010-11-11 Thread K. Seshadri Iyer

Hi, Pardon me if this sounds very elementary, but I have a very basic question regarding Solr search. I have about 10 storage devices running Solaris with hundreds of thousands of text files (there are other files, as well, but my target is these text files). The directories on the Solaris boxes a

Re: importing from java

2010-11-11 Thread Tri Nguyen

another question is, can I write my own DataImportHandler class? thanks, Tri From: Tri Nguyen To: solr user Sent: Thu, November 11, 2010 7:01:25 PM Subject: importing from java Hi, I'm restricted to the following in regards to importing. I have access to

RE: importing from java

2010-11-11 Thread Eric Martin

http://wiki.apache.org/solr/DIHQuickStart http://wiki.apache.org/solr/DataImportHandlerFaq http://wiki.apache.org/solr/DataImportHandler -Original Message- From: Tri Nguyen [mailto:tringuye...@yahoo.com] Sent: Thursday, November 11, 2010 9:34 PM To: solr-user@lucene.apache.org Subject: R

Re: Rollback can't be done after committing?

2010-11-11 Thread gengshaoguang

Oh, Pardeep: I don't think lucene is a advanced storage app to support rollback to a history check point (which would be support only in distributed system, such as tow phase commit or transactional web services) yours On Friday, November 12, 2010

Looking for help with Solr implementation

2010-11-11 Thread AC

Hi, Not sure if this is the correct place to post but I'm looking for someone to help finish a Solr install on our LAMP based website. This would be a paid project. The programmer that started the project got too busy with his full-time job to finish the project. Solr has been installed a

Re: Best practices to rebuild index on live system

2010-11-11 Thread Shawn Heisey

On 11/11/2010 4:45 PM, Robert Gründler wrote: So far, i can only think of 2 scenarios for rebuilding the index, if we need to update the schema after the rollout: 1. Create 3 more cores (A1,B1,C1) - Import the data from the database - After importing, switch the application to cores A1, B1, C

Re: Link to download solr4.0 is not working?

2010-11-11 Thread Shawn Heisey

On 11/11/2010 7:44 PM, Deche Pangestu wrote: Hello, Does anyone know where to download solr4.0 source? I tried downloading from this page: http://wiki.apache.org/solr/FrontPage#solr_development but the link is not working... Your best bet is to use svn. http://lucene.apache.org/solr/version_con

72 matches

Mail list logo