date:20090806

Re: Data loading from DB - data sizes and obstacles

2009-08-06 Thread Avlesh Singh

I have been a satisfied DIH user for a long time. The project I use Solr for, runs on a MySQL (5.1) version. There are 6 solr-cores in total with a combined index size of 12G. The database design is as relational as it can get, and writing SQL queries to fetch the data has always been always a prob

Re: Data loading from DB - data sizes and obstacles

2009-08-06 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Fri, Aug 7, 2009 at 11:15 AM, Amit Nithian wrote: > All, > An off and on project of mine has been to work on refactoring the way we > load data from MySQL into Solr. Our current approach is fairly hard coded > and not configurable as I would like. I was curious of people who have used > the DIH

Data loading from DB - data sizes and obstacles

2009-08-06 Thread Amit Nithian

All, An off and on project of mine has been to work on refactoring the way we load data from MySQL into Solr. Our current approach is fairly hard coded and not configurable as I would like. I was curious of people who have used the DIH and/or LuSQL to load data into Solr, how much data you typicall

Re: How to set solr/home in linux OS?

2009-08-06 Thread Amit Nithian

Have you tried setting solr home via the JNDI? I think you can set it via solr/home but that would require adding this to your servlet context configuration. Another option is to trace the startup scripts for Glassfish and see what environment variables are passed in. JAVA_OPTS would make sense but

Re: Transfer of Index Vs HTTP GET Vs Embedded Solr -- Urgent Help

2009-08-06 Thread Ninad Raut

The you should consider replicating the index to the local intranet and still run that it as a separate app. Will it be the same master-slave replication?? If the master is multicore, can I specifically replicate an index of a certain core ? Thanks for the help. 2009/8/7 Noble Paul നോബിള്‍ नोब

Re: Transfer of Index Vs HTTP GET Vs Embedded Solr -- Urgent Help

2009-08-06 Thread Noble Paul നോബിള്‍ नोब्ळ्

The you should consider replicating the index to the local intranet and still run that it as a separate app. On Fri, Aug 7, 2009 at 10:53 AM, Ninad Raut wrote: > The remote web app will be accessing the Solr server via internet. Its not a > intranet setup. > > On Fri, Aug 7, 2009 at 10:19 AM, Walt

Re: Transfer of Index Vs HTTP GET Vs Embedded Solr -- Urgent Help

2009-08-06 Thread Ninad Raut

The remote web app will be accessing the Solr server via internet. Its not a intranet setup. On Fri, Aug 7, 2009 at 10:19 AM, Walter Underwood wrote: > About the first option, caches are more effective with more traffic, so ten > front end servers using three Solr servers will have better caching

Re: Transfer of Index Vs HTTP GET Vs Embedded Solr -- Urgent Help

2009-08-06 Thread Walter Underwood

About the first option, caches are more effective with more traffic, so ten front end servers using three Solr servers will have better caching and probably better overall performance than having separate search on all ten servers. You can even put an HTTP cache in there and get better cach

Re: Attempt to query for max id failing with exception

2009-08-06 Thread Avlesh Singh

> > params.setQuery(queryString); > The query string is "*:*", right? Your id field is sortable, right? Cheers Avlesh On Fri, Aug 7, 2009 at 5:58 AM, Reuben Firmin wrote: > I'm using SolrJ. When I attempt to set up a query to retrieve the maximum > id > in the index, I'm getting an exception.

Re: Language Detection for Analysis?

2009-08-06 Thread Otis Gospodnetic

Bradford, If I may: Have a look at http://www.sematext.com/products/language-identifier/index.html And/or http://www.sematext.com/products/multilingual-indexer/index.html Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP,

Re: Transfer of Index Vs HTTP GET Vs Embedded Solr -- Urgent Help

2009-08-06 Thread Ninad Raut

Hi Noble, Can you explain a bit more on how to use Solr "out of the box". I am looking at ways to design the UI for remote application quickly and with less problems. Also could you elaborate more on what can go wrong with the first option? Thanks. 2009/8/6 Noble Paul നോബിള്‍ नोब्ळ् > On Thu, A

Re: Multi tokenizer

2009-08-06 Thread Koji Sekiguchi

Chris Hostetter wrote: : I need to tokenize my field on whitespaces, html, punctuation, apostrophe : but if I use HTMLStripStandardTokenizerFactory it strips only html : but no apostrophes you might consider using one of the HTML Tokenizers, and then use a PatternReplaceFilterFilter ...

Re: How to set solr/home in linux OS?

2009-08-06 Thread Otis Gospodnetic

It looks like you export JAVA_OPTS in your .profile, but I bet Tomcat also sets and thus overrides this same JAVA_OPTS it its own start up script. So that is what you should edit and modify. I'm a Jetty user, so I don't have a Tomcat startup script to check for you. Otis -- Sematext is hiring

Re: Item Facet

2009-08-06 Thread Avlesh Singh

Dynamic fields might be an answer. If you had a field called "product_*" and these were populated with the corresponding values during indexing then faceting on these fields will give you the desired behavior. The only catch here is that the product names have to be known upfront. A wildcard suppo

Re: How to set solr/home in linux OS?

2009-08-06 Thread huenzhao

I have tried, but it was also not work! The goal to set solr.home in tomcat6 is to start solr when the tomcat6 is starting. So I think the problem is that the solr can not start by set the solr.home when glassfish is starting. Chantal Ackermann wrote: > > > You have to quote values that i

Attempt to query for max id failing with exception

2009-08-06 Thread Reuben Firmin

I'm using SolrJ. When I attempt to set up a query to retrieve the maximum id in the index, I'm getting an exception. My setup code is: final SolrQuery params = new SolrQuery(); params.addSortField("id", ORDER.desc); params.setRows(1); params.setQuery(queryString); fi

Re: Summing sub categories in faceting

2009-08-06 Thread Koji Sekiguchi

There is a patch for it: https://issues.apache.org/jira/browse/SOLR-64 Koji Jón Helgi Jónsson wrote: Did a bit more creative searching for a solution and came up with this: http://www.mail-archive.com/solr-user@lucene.apache.org/msg15027.html I'm using couple of days old nightly build, so u

Re: Language Detection for Analysis?

2009-08-06 Thread Lucas F. A. Teixeira

Google Translate just released (last week) its language API with translation and LANGUAGE DETECTION. :) It's very simple to use, and you can query it with some text to define witch language is it. Here is a simple example using groovy, but all you need is the url to query: http://groovyconsole.ap

Re: Language Detection for Analysis?

2009-08-06 Thread Robert Muir

fyi, you can use the block property,but I think even better is to use the unicode script property: http://unicode.org/reports/tr24/ . This is easier because some characters are common across different scripts. Also, some scripts span multiple unicode blocks. This is the direction I was heading LUC

Re: Language Detection for Analysis?

2009-08-06 Thread Cheolgoo Kang

Is that 'blocks of text' is a (unicode) Java string? I don't think this is the case, but then, use Character.UnicodeBlock to identify the language of the text. And, is that just text files with unknown character encoding? Then ICU has a 'charset detector' that you can use. This feature 'suggests'

Re: Item Facet

2009-08-06 Thread David Lojudice Sobrinho

I can't reindex because the aggregated/grouped result should change as the query changes... in other words, the result must by dynamic We've been thinking about a new handler for it something like: /select?q=laptop&rows=0&itemfacet=on&itemfacet.field=product_name,min(price),max(price) Does i

Re: enablereplication does not work

2009-08-06 Thread solr jay

By the way, I was using command=indexversion to verify replication is on or off. Since it seems not reliable, is there a better to do it? Thanks, On Thu, Aug 6, 2009 at 8:43 AM, solr jay wrote: > You are right. Replication was disabled after the server was restarted, and > then I saw the behavi

Re: Summing sub categories in faceting

2009-08-06 Thread Jón Helgi Jónsson

Did a bit more creative searching for a solution and came up with this: http://www.mail-archive.com/solr-user@lucene.apache.org/msg15027.html I'm using couple of days old nightly build, so unless there is something new I should know about I'm going with that method :) 2009/8/6 Jón Helgi Jónsson

RE: concurrent csv loading

2009-08-06 Thread Smiley, David W.

You should stand to benefit from concurrent loading. Certainly the text analysis would end up being done concurrently; I'm not sure what else benefits from it but I think there are other things. Ideally you could try a configurable number of concurrent loads and pick the one that gets the job

RE: Item Facet

2009-08-06 Thread Ge, Yao (Y.)

If you can reindex, simply rebuild the index with fields replaced by combining existing fields. -Yao -Original Message- From: David Lojudice Sobrinho [mailto:dalss...@gmail.com] Sent: Thursday, August 06, 2009 4:17 PM To: solr-user@lucene.apache.org Subject: Item Facet Hi... Is there

concurrent csv loading

2009-08-06 Thread Joe Calderon

for first time loads i currently post to /update/csv?commit=false&separator=%09&escape=\&stream.file=workfile.txt&map=NULL:&keepEmpty=false", this works well and finishes in about 20 minutes for my work load. this is mostly cpu bound, i have an 8 core box and it seems one takes the brunt of the wo

Re: Limit of Index size per machine..

2009-08-06 Thread Tom Burton-West

Hello, I think you are confusing the size of the data you want to index with the size of the index. For our indexes (large full text documents) the Solr index is about 1/3 of the size of the documents being indexed. For 3 TB of data you might have an index of 1 TB or less. This depends on many

Item Facet

2009-08-06 Thread David Lojudice Sobrinho

Hi... Is there any way to group values like shopping.yahoo.com or shopper.cnet.com do? For instance, I have documents like: doc1 - product_name1 - value1 doc2 - product_name1 - value2 doc3 - product_name1 - value3 doc4 - product_name2 - value4 doc5 - product_name2 - value5 doc6 - product_name2 -

Re: Language Detection for Analysis?

2009-08-06 Thread Robert Muir

Bradford, there is an arabic analyzer in trunk. for farsi there is currently a patch available: http://issues.apache.org/jira/browse/LUCENE-1628 one option is not to detect languages at all. it could be hard for short queries due to the languages you mentioned borrowing from each other. but you do

Language Detection for Analysis?

2009-08-06 Thread Bradford Stephens

Hey there, We're trying to add foreign language support into our new search engine -- languages like Arabic, Farsi, and Urdu (that don't work with standard analyzers). But our data source doesn't tell us which languages we're actually collecting -- we just get blocks of text. Has anyone here worke

Re: Revisiting IDF Problems and Index Slices

2009-08-06 Thread Otis Gospodnetic

As soon as I started reading your message I started thinking "common grams", so that is what I would try first, esp. since somebody already did the work of porting that from Nutch to Solr (see Solr JIRA). Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Ka

Revisiting IDF Problems and Index Slices

2009-08-06 Thread Mark Bennett

I'm investigating a problem I bet some of you have hit before, and exploring several options to address it. I suspect that this specific IDF scenario is common enough that it even has a name, though I'm not what it would be called. The scenario: Suppose you have a search application focused on t

Re: mergeFactor / indexing speed

2009-08-06 Thread Avlesh Singh

> > does DIH call commit periodically, or are things done in one big batch? > AFAIK, one big batch. Cheers Avlesh On Thu, Aug 6, 2009 at 11:23 PM, Yonik Seeley wrote: > On Mon, Aug 3, 2009 at 12:32 PM, Chantal > Ackermann wrote: > > avg-cpu: %user %nice%sys %iowait %idle > > 1

Summing sub categories in faceting

2009-08-06 Thread Jón Helgi Jónsson

Hi, would really appreciate some help on this. I'm doing a category browser for companies. Kind of like a yellow pages. For each company I store each category the company is in like this: Example for Boeing would be 03.03.02 which is an fictional id for 'Jets' The beginning point I display all c

Re: mergeFactor / indexing speed

2009-08-06 Thread Yonik Seeley

On Mon, Aug 3, 2009 at 12:32 PM, Chantal Ackermann wrote: > avg-cpu: %user %nice %sys %iowait %idle > 1.23 0.00 0.03 0.03 98.71 > > Basically, it is doing very little? *scratch* How often is commit being called? (a Lucene commit sync's all of the index files so a cra

Re: mergeFactor / indexing speed

2009-08-06 Thread Avlesh Singh

> > Do you think it's possible to return (in the nested entity) rows > independent of the unique id, and let the processor decide when a document > is complete? > I don't think so. In my case, I had 9 (JDBC) entities for each document. Most of these entities returned a single column and limited nu

Re: mergeFactor / indexing speed

2009-08-06 Thread Chantal Ackermann

Hi all, to keep this thread up to date... ;-) d) jdbc batch size changed to 10. (Was default: 500, then 1000) The problem with my dih setup is that the root entity query returns a huge set (all ids that shall be indexed). A larger fetchsize would be good for that query. The nested entity, ho

Re: 99.9% uptime requirement

2009-08-06 Thread Walter Underwood

Design so that you can handle the load with one server down (N+1 sizing), then take one server out for any maintenance. Simple and works fine. wunder On Aug 6, 2009, at 9:25 AM, Robert Petersen wrote: Here is another idea. With solr multicore you can dynamically spin up extra cores and br

RE: 99.9% uptime requirement

2009-08-06 Thread Robert Petersen

Here is another idea. With solr multicore you can dynamically spin up extra cores and bring them online. I'm not sure how well this would work for us since we have hard coded the names of the cores we are hitting in our config files. -Original Message- From: Brian Klippel [mailto:br...@t

Re: Issue with special charcters in solar search

2009-08-06 Thread Avlesh Singh

Something similar has been discussed earlier. Go through this thread - http://www.lucidimagination.com/search/document/b5977650557f50cb/problem_with_query_parser PS: Solr is pronounced as "Solar" but written without the "a". Cheers Avlesh On Thu, Aug 6, 2009 at 7:18 PM, Deepak VSVK wrote: > Hi

RE: 99.9% uptime requirement

2009-08-06 Thread Brian Klippel

You could create a new "working" core, then call the swap command once it is ready. Then remove the work core and delete the appropriate index folder at your convenience. -Original Message- From: Robert Petersen [mailto:rober...@buy.com] Sent: Wednesday, August 05, 2009 6:41 PM To: solr

Re: enablereplication does not work

2009-08-06 Thread solr jay

You are right. Replication was disabled after the server was restarted, and then I saw the behavior. After I added some data, command "indexversion" returns the right values. So it seems Solr behaved correctly. Thanks, 2009/8/5 Noble Paul നോബിള്‍ नोब्ळ् > how is the replicationhandler configure

Preserving "C++" and other weird tokens

2009-08-06 Thread Michael _

Hi everyone, I'm indexing several documents that contain words that the StandardTokenizer cannot detect as tokens. These are words like C# .NET C++ which are important for users to be able to search for, but get treated as "C", "NET", and "C". How can I create a list of words that should be

Re: help getting started with spell check dictionary

2009-08-06 Thread Grant Ingersoll

I'm guessing it is because you have your Spell checker mapped to the "spellchecker" request handler, but you are asking the standard request handler to build the spell checker. Unless you've modified the Standard Req Handler, it is not spell check aware. Try http://localhost:8983/solr/sele

Re: DataImportHandler: Partial Delete and Update (Hacking "deleteQuery" in SOLR 1.3?)

2009-08-06 Thread Chantal Ackermann

Great! *bow* Thanks, Chantal this should do the trick

Re: DataImportHandler: Partial Delete and Update (Hacking "deleteQuery" in SOLR 1.3?)

2009-08-06 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Thu, Aug 6, 2009 at 6:41 PM, Chantal Ackermann wrote: > Hi again, > > 1.4 runs fine for me, now, but I'm still struggling for the correct delete > query. There is few to no documentation at all for the new special commands, > and I have problems guessing the correct setup from reading through th

Issue with special charcters in solar search

2009-08-06 Thread Deepak VSVK

Hi , In my application I am trying to search with some special characters like , $ # , sloar returning all the search results available . Some of the charcters like _ . are not encoding in the search url .can any one have any idea , what would be the root cause of this . I am using jetty server

Re: DataImportHandler: Partial Delete and Update (Hacking "deleteQuery" in SOLR 1.3?)

2009-08-06 Thread Chantal Ackermann

Hi again, 1.4 runs fine for me, now, but I'm still struggling for the correct delete query. There is few to no documentation at all for the new special commands, and I have problems guessing the correct setup from reading through the code. SORL-1060 is not enough help. I've come up with a se

Re: Limit of Index size per machine..

2009-08-06 Thread Ian Connor

Hi, Solr is fine out of RAM if you don't change it (build and then let it cache what it needs). The RAM is needed when you constantly pepper it with updates and commits. If you can have the logs update certain shards and then merge those indexes periodically to machines you can leave alone - this

Re: Importing Existing Non-SOLR Lucene Indexes into Solr

2009-08-06 Thread Avlesh Singh

> > if I know my fields and there are not many in the lucene index I should not > face any problen creating a schema or are there any pitfalls which I should > be aware off. Nothing specific. The creation of schema should be very straightforward. Just make sure you use the right field types. Chee

Re: Importing Existing Non-SOLR Lucene Indexes into Solr

2009-08-06 Thread Ninad Raut

But getting the schema right would be the challenge. if I know my fields and there are not many in the lucene index I should not face any problen creating a schema or are there any pitfalls which I should be aware off. Thanks for such quick replies guys. 2009/8/6 Noble Paul നോബിള്‍ नोब्ळ् > yea

Re: Importing Existing Non-SOLR Lucene Indexes into Solr

2009-08-06 Thread Noble Paul നോബിള്‍ नोब्ळ्

yeah the big part was missed . You need to setup a schema.xml matching the field names and types and you would need a solrconfig.xml . But getting the schema right would be the challenge. On Thu, Aug 6, 2009 at 5:23 PM, Mark Miller wrote: > Your kidding right :) > > Noble Paul നോബിള്‍ नोब्ळ् wrote

Re: Importing Existing Non-SOLR Lucene Indexes into Solr

2009-08-06 Thread Avlesh Singh

> > what about the schema and querying?? there should be some changes to the > solr schema I think. Correct me if I am wrong. > Of course! You have to create your own schema inside the schema.xml and adjust values inside solrconfig.xml at the bare minimum to get started. Cheers Avlesh On Thu, Aug

Re: Importing Existing Non-SOLR Lucene Indexes into Solr

2009-08-06 Thread Avlesh Singh

I am also interested in knowing! Does it work? Cheers Avlesh On Thu, Aug 6, 2009 at 5:23 PM, Mark Miller wrote: > Your kidding right :) > > Noble Paul നോബിള്‍ नोब्ळ् wrote: > >> just copy the whole index into /index and start Solr. That >> should just fine >> >> On Thu, Aug 6, 2009 at 5:17 PM,

Re: Importing Existing Non-SOLR Lucene Indexes into Solr

2009-08-06 Thread Ninad Raut

what about the schema and querying?? there should be some changes to the solr schema I think. Correct me if I am wrong. 2009/8/6 Noble Paul നോബിള്‍ नोब्ळ् > just copy the whole index into /index and start Solr. That > should just fine > > On Thu, Aug 6, 2009 at 5:17 PM, Ninad Raut > wrote: > > H

Re: Importing Existing Non-SOLR Lucene Indexes into Solr

2009-08-06 Thread Mark Miller

Your kidding right :) Noble Paul നോബിള്‍ नोब्ळ् wrote: just copy the whole index into /index and start Solr. That should just fine On Thu, Aug 6, 2009 at 5:17 PM, Ninad Raut wrote: Hi, Is there a way to import existing Lucene Indexes to SOLR?? I have a huge lucene index which I want to impo

Re: Importing Existing Non-SOLR Lucene Indexes into Solr

2009-08-06 Thread Noble Paul നോബിള്‍ नोब्ळ्

just copy the whole index into /index and start Solr. That should just fine On Thu, Aug 6, 2009 at 5:17 PM, Ninad Raut wrote: > Hi, > Is there a way to import existing Lucene Indexes to SOLR?? I have a huge > lucene index which I want to import into SOLR server. > Regards, > Ninad Raut. > -- -

Importing Existing Non-SOLR Lucene Indexes into Solr

2009-08-06 Thread Ninad Raut

Hi, Is there a way to import existing Lucene Indexes to SOLR?? I have a huge lucene index which I want to import into SOLR server. Regards, Ninad Raut.

Re: How to set solr/home in linux OS?

2009-08-06 Thread Chantal Ackermann

You have to quote values that include whitespace: export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/home/huenzhao/search/solr" or to make it accessible for other paths as well: export SOLR_HOME=/home/huenzhao/search/solr export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=$SOLR_HOME" Cheers, Chantal

Re: wildcard search is not working

2009-08-06 Thread Avlesh Singh

Go through this thread first - http://markmail.org/message/bannl2fpblt5sqlw If it still does not help, post back your field type definition in schema.xml Cheers Avlesh On Thu, Aug 6, 2009 at 3:46 PM, Radha C. wrote: > Hi, > > > I have documents contain word "healthcare articles". I need to matc

wildcard search is not working

2009-08-06 Thread Radha C.

Hi, I have documents contain word "healthcare articles". I need to match the "healthcare artcles" documents for the query strings "helath", "articles"... I tried q="health*", q=helath*, q="heath*articles" but everything returns empty result. When I try q="healthcare artilces" ,the search re

How to set solr/home in linux OS?

2009-08-06 Thread huenzhao

Hi all, I know how to configure solr.home by using tomcat6, but I don't know how to set solr.home by using Glassfish(V2.1). I have tried to set the solr.home in .profile as fellows: export solr.home=/home/huenzhao/search/solr export solr/home=/home/huenzhao/search/solr export solr.solr.home=/hom

Re: Transfer of Index Vs HTTP GET Vs Embedded Solr -- Urgent Help

2009-08-06 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Thu, Aug 6, 2009 at 12:24 PM, Ninad Raut wrote: > Hi, > I have a search engine on Solr. Also I have a remote web application which > will be using the Solr Indexes for search. > I have three scenarios: > 1) Transfer the Indexes to the Remote Application. > > - This will reduce load on the actu

63 matches

Mail list logo