Re: Text classification with Solr

2009-01-27 Thread Hannes Carl Meyer
>>Instead of indexing documents about 'sports' and searching for hits >>based upon 'basketball', 'football' etc.. I simply want to index the >>taxonomy and classify documents into it. This is a an ancient >>AI/Data-Mining discipline.. but the standard methods of 'indexing' the >>taxonomy are/were

multilanguage prototype

2009-01-27 Thread revathy arun
Hi, I have downloade solr1.3.0 . I need to index chinese content ,for this i have defined a new field in the schema as I beleive solr1.3 already has the cjkanalyzer by default. my schema in the testing stage has only 2 fields However when i index the chinese text into

Re: multilanguage prototype

2009-01-27 Thread Shalin Shekhar Mangar
Did you commit after the updates? 2009/1/27 revathy arun > Hi, > > I have downloade solr1.3.0 . > > I need to index chinese content ,for this i have defined a new field in the > schema > > as > > > positionIncrementGap="100"> > > > > > > > > > > > > > > > > > > I beleive solr1.3 already

Re: multilanguage prototype

2009-01-27 Thread revathy arun
Hi I have committed.The admin page does not show any docs pending or committed or any errors. Regards Sujatha On 1/27/09, Shalin Shekhar Mangar wrote: > > Did you commit after the updates? > > 2009/1/27 revathy arun > > > Hi, > > > > I have downloade solr1.3.0 . > > > > I need to index chines

Re: multilanguage prototype

2009-01-27 Thread revathy arun
this is the stats of my updatehandler but i still dont see any index created *stats: *commits : 7 autocommits : 0 optimizes : 2 docsPending : 0 adds : 0 deletesById : 0 deletesByQuery : 0 errors : 0 cumulative_adds : 0 cumulative_deletesById : 0 cumulative_deletesByQuery : 0 cumulative_errors : 0

Re: multilanguage prototype

2009-01-27 Thread Shalin Shekhar Mangar
Are you looking for it in the right place? It is very unlikely that a commit happens and index is not created. The index is usually created inside the data directory as configured in your solconfig.xml Can you search for *:* from the solr admin page and see if documents are returned? On Tue, Jan

Re: multilanguage prototype

2009-01-27 Thread revathy arun
Hi Shalin, The admin page stats are as follows searcherName : searc...@1d4c3d5 main caching : true numDocs : 0 maxDoc : 0 *name: * /update *class: * org.apache.solr.handler.XmlUpdateRequestHandler *version: * $Revision: 690026 $ *description: * Add documents with XML * stats: *handlerStart :

Highlighting does not work?

2009-01-27 Thread Jarek Zgoda
Solr 1.3 I'm trying to get highlighting working, with no luck so far. Query with params q=cyrus&fl=*,score&qt=standard&hl=true&hl.fl=title +description finds 182 documents in my index. All of the top 10 hits contain the word "cyrus", but the highlights list is empty. The fields "title" and

Re: Highlighting does not work?

2009-01-27 Thread Jarek Zgoda
I turned these fields to indexed + stored but the results are exactly the same, no matter if I search in these fields or elsewhere. Wiadomość napisana w dniu 2009-01-27, o godz. 13:09, przez Jarek Zgoda: Solr 1.3 I'm trying to get highlighting working, with no luck so far. Query with params

Re: multilanguage prototype

2009-01-27 Thread Erik Hatcher
errors: 11 What were those? My hunch is your indexer had issues. What did Solr output into the console or log during indexing? Erik On Jan 27, 2009, at 6:56 AM, revathy arun wrote: Hi Shalin, The admin page stats are as follows searcherName : searc...@1d4c3d5 main caching : true

Re: Highlighting does not work?

2009-01-27 Thread Jarek Zgoda
Finally found that the fields have to have an analyzer to be highlighted. Neat. Can I ask somebody to document these all requirements? Wiadomość napisana w dniu 2009-01-27, o godz. 13:49, przez Jarek Zgoda: I turned these fields to indexed + stored but the results are exactly the same, no m

Re: Error in Integrating JBoss 4.2 and Solr-1.3.0:

2009-01-27 Thread maveen
I am also getting the same issue. Did any one found the solution for this... Please respond sbutalia wrote: > > I'm having the same issue.. have you had any progress with this? > -- View this message in context: http://www.nabble.com/Error-in-Integrating-JBoss-4.2-and-Solr-1.3.0%3A-tp2020203

Re: Connection mismanagement in Solrj?

2009-01-27 Thread Walter Underwood
Making requests in parallel, using the default connection manager, which is multi-threaded, and we are reusing a single CommonsHttpSolrServer for all requests. wunder On 1/26/09 10:59 PM, "Noble Paul നോബിള്‍ नोब्ळ्" wrote: > are you making requests in parallel ? > which ConnectionManager are y

Re: fastest way to index/reindex

2009-01-27 Thread Ian Connor
When you query by *:*, what order does it use. Is there a chance they will come in a different order as you page through the results (and miss/dupicate some). Is it best to put the order explicitly by 'id' or is that implied already? On Mon, Jan 26, 2009 at 12:00 PM, Ian Connor wrote: > *:* took

Re: fastest way to index/reindex

2009-01-27 Thread Erik Hatcher
*:* will default to sorting by document insertion order (Lucene's document id, _not_ your Solr uniqueKey). And no, you won't miss any by paging - order will be maintained. Erik On Jan 27, 2009, at 9:52 AM, Ian Connor wrote: When you query by *:*, what order does it use. Is there a

Re: solrj delete by Id problem

2009-01-27 Thread Parisa
I found how the issue is created .when solr warm up the new searcher with cacheLists , if the queryResultCache is enable the issue is created. notice:as I mentioned before I commit with waitflush=false and waitsearcher=false so it has problem in case the queryResultCache is on, but I don't know

Re: solrj delete by Id problem

2009-01-27 Thread Shalin Shekhar Mangar
On Tue, Jan 27, 2009 at 8:51 PM, Parisa wrote: > > I found how the issue is created .when solr warm up the new searcher with > cacheLists , if the queryResultCache is enable the issue is created. > > notice:as I mentioned before I commit with waitflush=false and > waitsearcher=false > > so it has

Re: QParserPlugin

2009-01-27 Thread Karl Wettin
So it was me defining it in schema.xml rather than solrconfig.xml. 17:17 < erikhatcher> where are you defining the qparser plugin? 17:18 < erikhatcher> it's very odd... if it isn't picking them up but you reference them, it would certainly give an error 17:18 < karlwettin> as a first level chil

Re: Text classification with Solr

2009-01-27 Thread Neal Richter
On Tue, Jan 27, 2009 at 1:36 AM, Hannes Carl Meyer wrote: > Yeah, know it, the challenge on this method is the calculation of the score > and parametrization of thresholds. Not as worried about score itself as the score thresholds for prediction in/out. > Is it really neccessary to use Solr for

query with stemming, prefix and fuzzy?

2009-01-27 Thread Gert Brinkmann
Hello, I am trying to get Solr to properly work. I have set up a Solr test server (using jetty as mentioned in the tutorial). Also I had to modify the schema.xml so that I have different fields for different languages (with their own stemmers) that occur in the content management system that I am

Re: Connection mismanagement in Solrj?

2009-01-27 Thread Yonik Seeley
That's interesting SolrJ doesn't touch HTTPClient params if one is provided in the constructor. I guess I'd try to sniff the headers first and see if any difference sticks out between the clients. I normally just use netcat and pretend to be the solr server. -Yonik On Tue, Jan 27, 2009 at 1

Re: Connection mismanagement in Solrj?

2009-01-27 Thread Ryan McKinley
if you use this constructor: public CommonsHttpSolrServer(URL baseURL, HttpClient client) then solrj never touches the HttpClient configuration. I normally reuse a single CommonsHttpSolrServer as well. On Jan 27, 2009, at 9:52 AM, Walter Underwood wrote: Making requests in parallel, using

multiple indexes

2009-01-27 Thread Jae Joo
Hi, I would like to know how it can be implemented. Index1 has fields id,1,2,3 and index2 has fields id,5,6,7. The ID in both indexes are unique id. Can I use "a kind of " distributed search and/or multicore to search, sort, and facet through 2 indexes (index1 and index2)? Thanks, Jae joo

Re: Setting dataDir in multicore environment

2009-01-27 Thread Mark Ferguson
Oh I see, thanks for the clarification. Unfortunately this brings me back to same problem I started with: implicit properties aren't available when managing indexes through the REST api. I know there is a patch in the works for this issue but I can't wait for it. Is there any way to share the solr

Re: Text classification with Solr

2009-01-27 Thread Karl Wettin
27 jan 2009 kl. 17.23 skrev Neal Richter: Is it really neccessary to use Solr for it? Things going much faster with Lucene low-level api and much faster if you're loading the classification corpus into the RAM. Good points. At the moment I'd rather have a daemon with a service API.. as

index size tripled during optimization

2009-01-27 Thread Qingdi
Hi, Starting about one week ago, our index size gets tripled during optimization. The current index statistics are: numDocs : 192702132 size: 76G And we do optimization for every 6M docs update. Since we keep getting new data, the index size increases every day. Before, the index size was on

Indexing documents in multiple languages

2009-01-27 Thread Alejandro Valdez
Hi, I plan to use solr to index a large number of documents extracted from emails bodies, such documents could be in different languages, and a single document could be in more than one language. In the same way, the query string could be words in different languages. I read that a common approac

Re: Indexing documents in multiple languages

2009-01-27 Thread Erick Erickson
First, I'd search the mail archive for the topic of languages, it's been discussed often and there's a wealth of information that might be of benefit, far more information than I can remember. As to whether your approach will be "too big, too slow...", you really haven't given enough information t

Optimizing & Improving results based on user feedback

2009-01-27 Thread Matthew Runo
Hello folks! We've been thinking about ways to improve organic search results for a while (really, who hasn't?) and I'd like to get some ideas on ways to implement a feedback system that uses user behavior as input. Basically, it'd work on the premise that what the user actually clicked o

Re: Text classification with Solr

2009-01-27 Thread Grant Ingersoll
I guess I've been called to the chalkboard... I haven't looked specifically at putting the taxonomy in Lucene/Solr, but it is an interesting idea. In reading the paper you mentioned, there are some interesting ideas there and Solr could obviously just as easily be used as Lucene, I think.

Re: Optimizing & Improving results based on user feedback

2009-01-27 Thread Walter Underwood
I've been thinking about the same thing. We have a set of queries that defy straightforward linguistics and ranking, like figuring out how to match "charlie brown" to "It's the Great Pumpkin, Charlie Brown" in October and to "A Charlie Brown Christmas" in December. I don't have any solutions yet,

Tools for Managing Synonyms, Elevate, etc.

2009-01-27 Thread
I'm considering building some tools for our internal non-technical staff to write to synonyms.txt, elevate.xml, spellings.txt, and protwords.txt so software developers don't have to maintain them. Before my team starts building these tools, has anyone done this before? If so, are these tools avai

Re: Highlighting does not work?

2009-01-27 Thread Mike Klaas
They are documented in http://wiki.apache.org/solr/ FieldOptionsByUseCase and in the FAQ , but I agree that it could be more readily accessible. -Mike On 27-Jan-09, at 5:26 AM, Jarek Zgoda wrote: Finally found that the fields have to have an analyzer to be highlighted. Neat. Can I ask so

Re: Connection mismanagement in Solrj?

2009-01-27 Thread Jon Baer
Could it be the framework you are using around it? I know some IOC containers will auto pool objects underneath as a service without you really knowing it is being done or has to be explicitly turned off. Just a thought. I use a single server for all requests behind a Hivemind setup ... umm not

Re: Setting dataDir in multicore environment

2009-01-27 Thread Noble Paul നോബിള്‍ नोब्ळ्
I shall give a patch today On Tue, Jan 27, 2009 at 11:58 PM, Mark Ferguson wrote: > Oh I see, thanks for the clarification. > > Unfortunately this brings me back to same problem I started with: implicit > properties aren't available when managing indexes through the REST api. I > know there is a

Re: Connection mismanagement in Solrj?

2009-01-27 Thread Noble Paul നോബിള്‍ नोब्ळ्
if you are making requests in parallel , then it is likely that you see many connections open at a time. They will get cleaned up over time . But if you wish to clean them up explicitly use httpclient.getHttpConnectionManager()r#closeIdleConnections() On Tue, Jan 27, 2009 at 8:22 PM, Walter Underw

question about dismax and parentheses

2009-01-27 Thread surfer10
Hello, dear members. I'm a little bit confused about dismax syntax. as far as i know (and i might be wrong) it supports default query language such as +WORD -WORD What about parentheses ? my title of doc consist of WORD1 WORD2 WORD3. when i'm trying to search +WORD1 +(WORD2 WORD4) + WORD3 it doe

[dummy question] applying patch

2009-01-27 Thread surfer10
i'm a little bit noob in java compiler so could you please tell me what tools are used to apply patch SOLR-236 (Field groupping), does it need to be applied on current solr-1.3 (and nightly builds of 1.4) or it already in box? what batch file stands for solr compilation in its distributive? -- V

Re: multilanguage prototype

2009-01-27 Thread revathy arun
Hi, This is the only info in the tomcat log at indexing Jan 27, 2009 3:46:15 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/lang_prototype path=/update params={} status=0 QTime=191 I dont see any ohter errors in the logs . when i use curl to update i get success message. and commit

Store limited text

2009-01-27 Thread Gargate, Siddharth
Hi All, Is it possible to store only limited text in the field, say, max 1 mb? The field maxfieldlength limits only the number of tokens to be indexed, but stores complete content. Thanks, Siddharth

Re: [dummy question] applying patch

2009-01-27 Thread Noble Paul നോബിള്‍ नोब्ळ्
since you are asking about 'batch file' , are you using windows? I recommend using TortoiseSVN to apply patch On Wed, Jan 28, 2009 at 10:05 AM, surfer10 wrote: > > i'm a little bit noob in java compiler so could you please tell me what tools > are used to apply patch SOLR-236 (Field groupping), d

Re: Setting dataDir in multicore environment

2009-01-27 Thread Noble Paul നോബിള്‍ नोब्ळ्
There is a patch given for SOLR-883 . On Wed, Jan 28, 2009 at 9:43 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote: > I shall give a patch today > > On Tue, Jan 27, 2009 at 11:58 PM, Mark Ferguson > wrote: >> Oh I see, thanks for the clarification. >> >> Unfortunately this brings me back to same problem I

Re: question about dismax and parentheses

2009-01-27 Thread surfer10
i found Hoss's explanations at http://www.nabble.com/Dismax-and-Grouping-query-td12938168.html#a12938168 seems to be i cant do this. so my question is transforming to following: can i join multiple dismax queries into one? for instance if i'm looking for +WORD1 +(WORD2 WORD3) it can be translate

Re: Setting dataDir in multicore environment

2009-01-27 Thread Mark Ferguson
This is just what I needed, thank you so much for the quick response! It's really appreciated! Mark On Tue, Jan 27, 2009 at 9:59 PM, Noble Paul നോബിള്‍ नोब्ळ् < noble.p...@gmail.com> wrote: > There is a patch given for SOLR-883 . > > On Wed, Jan 28, 2009 at 9:43 AM, Noble Paul നോബിള്‍ नोब्ळ् >

Re: Store limited text

2009-01-27 Thread Chris Harris
If you're using a Solr build post-r721758, then copyfield has a maxChars property you can take advantage of. I'm probably misremembering some of the exact names of these elements/attributes, but you can basically have this in your schema.xml: Then anything you store in field f will get copied

Re: Text classification with Solr

2009-01-27 Thread Neal Richter
On Tue, Jan 27, 2009 at 2:21 PM, Grant Ingersoll wrote: > One of the things I am interested in is the marriage of Solr and Mahout > (which has some Genetic Algorithms support) and other ML (Weka, etc.) tools. [snip] I love it, good to know you are thinking big here. Here's another big thought:

Re: multilanguage prototype

2009-01-27 Thread revathy arun
Hi, I a, getting this error in the tomcat log file on passing chinese test to the content field The content field uses the ckj tokenizer. and is defined as INFO: [] webapp=/lang_prototype path=/update params={} status=0 QTime=69 Jan 28, 2009 12:17:03 PM org.apache.solr.common.

Re: Optimizing & Improving results based on user feedback

2009-01-27 Thread Neal Richter
OK I've implemented this before, written academic papers and patents related to this task. Here are some hints: - you're on the right track with the editorial boosting elevators - http://wiki.apache.org/solr/UserTagDesign - be darn careful about assuming that one click is enough evidence

RE: Customizing Solr to handle Leading Wildcard queries

2009-01-27 Thread Jana, Kumar Raja
Hi, Thanks Otis, Newton and everyone else for the help on this issue. Most of the data I index are documents like pdfs, word Docs, open office documents, etc. I store the content of the document in a field called content and the remaining metadata of the document like name, id, created by, modifi