Re: how to get all the docIds in the search result?

2009-07-22 Thread Avlesh Singh
query.setRows(Integer.MAX_VALUE); Cheers Avlesh On Thu, Jul 23, 2009 at 8:15 AM, shb wrote: > When I use > SolrQuery query = new SolrQuery(); > query.set("q", "issn:0002-9505"); > query.setRows(10); > QueryResponse response = server.query(query); > I only

Re: DataImportHandler / Import from DB : one data set comes in multiple rows

2009-07-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
alternately, you can write your own EntityProcessor and just override the nextRow() . I guess you can still use the JdbcDataSource On Wed, Jul 22, 2009 at 10:05 PM, Chantal Ackermann wrote: > Hi all, > > this is my first post, as I am new to SOLR (some Lucene exp). > > I am trying to load data fro

Re: importing lots of db data. specially formated. what is fasted approach?

2009-07-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
look at this http://wiki.apache.org/solr/DIHQuickStart#head-532678fa5d0d9b33880abeb4d4995562014f8ef9 to know how to fetch data from multiple tables On Wed, Jul 22, 2009 at 4:57 PM, Julian Davchev wrote: > Well yes, transformation is required. But it's like data coming from > multiple tables.. etc.

how to get all the docIds in the search result?

2009-07-22 Thread shb
When I use SolrQuery query = new SolrQuery(); query.set("q", "issn:0002-9505"); query.setRows(10); QueryResponse response = server.query(query); I only can get the 10 ids in the response. How can i get all the docIds in the search result? Thanks.

Re: LocalSolr - order of fields on xml response

2009-07-22 Thread Ryan McKinley
ya... 'expected', but perhaps not ideal. As is, LocalSolr munges the document on its way out the door to add the distance. When LocalSolr makes it into the source, it will likely use a method like: https://issues.apache.org/jira/browse/SOLR-705 to augment each document with the calculated

Re: SolrException - Lock obtain timed out, no leftover locks

2009-07-22 Thread danben
Sorry, I thought I had removed this posting. I am running Solr over HTTP, but (as you surmised) I had a concurrency bug. Thanks for the response. Dan hossman wrote: > > > My only guess here is that you are using SolrJ in an embedded sense, not > via HTTP, and something about the code you h

Re: SolrException - Lock obtain timed out, no leftover locks

2009-07-22 Thread Chris Hostetter
My only guess here is that you are using SolrJ in an embedded sense, not via HTTP, and something about the code you have in your MyIndexers class causes two differnet threads to attempt to create two differnet cores (or perhaps the same core) using identical data directories at the same time.

Re: Behaviour when we get more than 1 million hits

2009-07-22 Thread Erick Erickson
That's still not very useful. Additional processing? Where, some clientthat you return all the data to? In which case SOLR is the least of your concerns, your network speed counts more. At a blind guess I'd worry more about how you're doing your "additional processing" than solr. Erick On Wed, J

Re: Storing string field in solr.ExternalFieldFile type

2009-07-22 Thread Erick Erickson
Hoping the experts chime in if I'm wrong, but As far as I know, while storing a field increases the size of an index, it doesn't have much impact on the search speed. Which you could pretty easily test by creating the index both ways and firing off some timing queries and comparing. Althoug

RE: solr 1.3.0 and Oracle Fusion Middleware

2009-07-22 Thread Hall, David
Thanks for the feedback... I tried to add what is below directly to the web.xml file right after the tag and bounce the OC4J - still same issue. false I checked other applications running on 10.1.3 OC4J and they are also using 2.3 web.xml. Regardless - I tried to change it to 2.

Re: excluding certain terms from facet counts when faceting based on indexed terms of a field

2009-07-22 Thread Chris Hostetter
: I am faceting based on the indexed terms of a field by using facet.field. : Is there any way to exclude certain terms from the facet counts? if you're talking about a lot of terms, and they're going to be hte same for *all* queries, the best appraoch is to strip them out when indexing (StopWo

Re: Random Slowness

2009-07-22 Thread Otis Gospodnetic
Or simply attach to the JVM with Jconsole and watch the GC from there. You'd have to watch things (logs and jconsole) closely though, and correlate the slow query periods with a GC spike. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Ed

Re: Synonyms from index

2009-07-22 Thread Otis Gospodnetic
Hi, There is nothing built-in. It might be possible to infer if two words are synonyms, but that's really not strictly a search thing, so it's not likely to be added to Solr in the near future. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > F

Re: Best approach to multiple languages

2009-07-22 Thread Grant Ingersoll
Typically there are three options that people do: 1. Put 'em all in one big field 2. Split Fields (as you and others have described) - not sure why no one ever splits on documents, which is viable too, but comes with repeated data 3. Split indexes For your case, #1 isn't going to work sinc

LocalSolr - order of fields on xml response

2009-07-22 Thread Daniel Cassiano
Hi folks, When I do some query with LocalSolr to get the geo_distance, the order of xml fields is different of a standard query. It's a simple query, like this: http://myhost.com:8088/solr/core/select?qt=geo&x=-46.01&y=-23.01&radius=15&sort=geo_distanceasc&q=*:* Is this an expected behavi

Re: Best approach to multiple languages

2009-07-22 Thread Olivier Dobberkau
Am 22.07.2009 um 18:31 schrieb Ed Summers: In case you are curious I've attached a copy of our schema.xml to give you an idea of what we did. Thanks for sharing! -- Olivier Dobberkau

Re: Best approach to multiple languages

2009-07-22 Thread Andrew McCombe
Hi We will know the user's language choice before searching. Regards Andrew 2009/7/22 Grant Ingersoll > How do you want to search those descriptions? Do you know the query > language going in? > > > On Jul 22, 2009, at 6:12 AM, Andrew McCombe wrote: > > Hi >> >> We have a dataset that conta

DataImportHandler / Import from DB : one data set comes in multiple rows

2009-07-22 Thread Chantal Ackermann
Hi all, this is my first post, as I am new to SOLR (some Lucene exp). I am trying to load data from an existing datamart into SOLR using the DataImportHandler but in my opinion it is too slow due to the special structure of the datamart I have to use. Root Cause: This datamart uses a row bas

Re: Best approach to multiple languages

2009-07-22 Thread Ed Summers
On Wed, Jul 22, 2009 at 11:35 AM, Grant Ingersoll wrote: >> My initial thoughts are to index each description as a separate field and >> append the language identifier to the field name, for example, three >> fields >> with description_en, description_de, descrtiption_fr.  Is this the best >> appro

Re: Best approach to multiple languages

2009-07-22 Thread Grant Ingersoll
How do you want to search those descriptions? Do you know the query language going in? On Jul 22, 2009, at 6:12 AM, Andrew McCombe wrote: Hi We have a dataset that contains productname, category and descriptions. The descriptions can be in one or more different languages. What would b

Re: US/UK/CA/AU English support

2009-07-22 Thread Grant Ingersoll
On Jul 22, 2009, at 5:09 AM, prerna07 wrote: Hi, 1) Out of US/UK/CA/AU,which english does solr support ? Please clarify what you mean by "support"? The only thing in Solr that is potentially language dependent are the Tokenizers and TokenFilters and those are completely pluggable. For

Re: solr 1.3.0 and Oracle Fusion Middleware

2009-07-22 Thread Mark Miller
Lets keep this communication on the list so others can benefit and chime in. What about the filter-dispatched-requests-enabled setting? Perhaps it doesn't use the weblogic.xml file anymore and you'll need to find the new way to configure that setting? From what I can see, that setting will default

Re: Random Slowness

2009-07-22 Thread Ed Summers
On Wed, Jul 22, 2009 at 10:44 AM, Jeff Newburn wrote: > How do I go about enabling the gc logging for solr? It depends how you are running solr. You basically want to make sure that when the JVM is started up with the java command, that it gets some additional arguments [1]. So for example if you

Re: Random Slowness

2009-07-22 Thread Jeff Newburn
Ed, How do I go about enabling the gc logging for solr? -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 > From: Ed Summers > Reply-To: > Date: Wed, 22 Jul 2009 10:39:03 -0400 > To: > Subject: Re: Random Slowness > > I haven't read this whole thread, so mayb

Re: Random Slowness

2009-07-22 Thread Ed Summers
I haven't read this whole thread, so maybe it's already come up. Have you turned on the garbage collection logging to see if the jvm is busy cleaning up when you are seeing the slowness? Maybe the jvm is struggling to keep the heap size within a particular limit? //Ed On Wed, Jul 22, 2009 at 10:2

Re: Random Slowness

2009-07-22 Thread Jeff Newburn
We can never reproduce the slowness with the same query. As soon as we try to run them again they are fine. I have even tried running the same query the next day and it is fine. All of our requests go through our dismax handler which is part of why it is so weird. Most queries are fine, however

Re: importing lots of db data. specially formated. what is fasted approach?

2009-07-22 Thread Bill Au
You can also do tables join in a SQL select to pick out the fields you want from multiple tables. You may want to use temporary tables during processing. Once you get the data the way you want it, you can use the CSV request handler to load in the output of the SQL select. Bill On Wed, Jul 22,

Re: Random Slowness

2009-07-22 Thread Erik Hatcher
On Jul 21, 2009, at 6:52 PM, Jeff Newburn wrote: We are experiencing random slowness on certain queries. I have been unable to diagnose what the issue is. We are using SOLR 1.4 and 99.99% of queries return in under 250 ms. The remaining queries are returning in 2-5 seconds for no appare

Re: importing lots of db data. specially formated. what is fasted approach?

2009-07-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
a transformer can be written in any language if you are using java 6 javascript support comes out of the box. http://wiki.apache.org/solr/DataImportHandler#head-27fcc2794bd71f7d727104ffc6b99e194bdb6ff9 On Wed, Jul 22, 2009 at 4:57 PM, Julian Davchev wrote: > Well yes, transformation is required.

Re: DIH example explanation

2009-07-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
any string that is templatized in DIH can have variables like this ${a.b} for instance look at the following url="http://xyz.com/atom/${dataimporter.request.foo}"; if you pass a parameter foo=bar when you invoke the command the url invoked becomes http://xyz.com/atom/bar the variable can come

Synonyms from index

2009-07-22 Thread Pooja Verlani
Hi, Is there a possible way to generate synonyms from the index ? I have an index with lots of searchable terms turning out to be having synonyms and users too have different synonyms. If not then the only way if to learn from the query logs and click logs but in case there exists one, please s

Re: importing lots of db data. specially formated. what is fasted approach?

2009-07-22 Thread Julian Davchev
Well yes, transformation is required. But it's like data coming from multiple tables.. etc. It's not like getting one row from table and possibly transforming it and using it. I am thinking perhaps to create some tables (views) that will have the data ready/flattened and then simply feeding it.

Re: DIH example explanation

2009-07-22 Thread Antonio Eggberg
:) thank you paul! and it works! I have one more stupid question about the wiki. "url (required) : The url used to invoke the REST API. (Can be templatized)." How do you templatize the URL? My URL's are being updated all the time by an external program. i.e. list of atom sites it's a text file.

[ApacheCon US] Travel Assistance

2009-07-22 Thread Grant Ingersoll
The Travel Assistance Committee is taking in applications for those wanting to attend ApacheCon US 2009 (Oakland) which takes place between the 2nd and 6th November 2009. The Travel Assistance Committee is looking for people who would like to be able to attend ApacheCon US 2009 who may nee

Re: Behaviour when we get more than 1 million hits

2009-07-22 Thread Rakhi Khatwani
Hi, There is this particulat scenarion where I want to search for a product and i get a million records which will be given for further processing. Regards, Raakhi On Mon, Jul 13, 2009 at 7:33 PM, Erick Erickson wrote: > It depends (tm) on what you try to do with the results. You really need

Re: importing lots of db data. specially formated. what is fasted approach?

2009-07-22 Thread Avlesh Singh
As Noble has already said, "transforming" content before indexing a very common requirement. DataImportHandler's Transformer lets you achieve this. Read up on the same here - http://wiki.apache.org/solr/DataImportHandler#head-a6916b30b5d7605a990fb03c4ff461b3736496a9 Cheers Avlesh 2009/7/22 Noble

Re: Best approach to multiple languages

2009-07-22 Thread Julian Davchev
Hi, We have such case...we don't want to search in all of those languages at once but just one of them. So we took the approach of different indexes for each language. From what I know it helps not breaking relevance of the stats as well. You know, how much an index is used etc etc. If you dig in

Best approach to multiple languages

2009-07-22 Thread Andrew McCombe
Hi We have a dataset that contains productname, category and descriptions. The descriptions can be in one or more different languages. What would be the recommended way of indexing these? My initial thoughts are to index each description as a separate field and append the language identifier to

Re: Word frequency count in the index

2009-07-22 Thread Pooja Verlani
Hi Grant, thanks for your reply. I have one more doubt, if I use Luke's request handler in solr for this issue, the top terms I get, are they term frequency or highest document frequency terms. I would like to get terms that occur max in a document and those document form a good percentage in the t

US/UK/CA/AU English support

2009-07-22 Thread prerna07
Hi, 1) Out of US/UK/CA/AU,which english does solr support ? 2) PhoneticFilterFactory perform search for similar sounding words. For example : search on carat will give results of carat, caret and carrat. I also observed that PhoneticFilterFactory also support linguistic variation for US/UK/C

Re: DIH example explanation

2009-07-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
The point is that namespace is ignored while DIH reads the xml. So just use the part after the colon (:) in your xpath expressions and it should just work. On Wed, Jul 22, 2009 at 2:16 PM, Antonio Eggberg wrote: > Hi, > > I am looking at the slashdot example and I am having hard time understan

Re: All in one index, or multiple indexes?

2009-07-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
keep in mind that everytime a commit is done all the caches are thrown away. If updates for each of these indexes happen at different time then the caches get invalidated each time you commit. so in that case smaller index helps On Wed, Jul 8, 2009 at 4:55 PM, Tim Sell wrote: > Hi, > I am wonderi

Re: Boosting Code

2009-07-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
public Map transform(Map row , Context ctx){ row.put("$docBoost", 3445); return row; } On Wed, Jul 22, 2009 at 12:02 PM, prerna07 wrote: > > Hi, > > I have to boost document, Can someone help me understanding how can we > implement docBoost via transformer. > > Thanks, > Prerna > > > Mar

DIH example explanation

2009-07-22 Thread Antonio Eggberg
Hi, I am looking at the slashdot example and I am having hard time understanding the following, from the wiki == "You can use this feature for indexing from REST API's such as rss/atom feeds, XML data feeds , other Solr servers or even well formed xhtml documents . Our XPath support has its

Re: importing lots of db data. specially formated. what is fasted approach?

2009-07-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
if each field from the db goes to a separate field in solr as-is . Then it is very simple. if you need to split/join fields before feeding it into solr fields you may need to apply transformers an example on how your db field looks like and how you wish it to look like in solr would be helpful On