Re: Features not present in Solr

2010-03-22 Thread David Smiley @MITRE.org
I use Endeca and Solr. A few notable things in Endeca but not in Solr: 1. Real-time search. 2. "related record navigation" (RRN) is what they call it. This is the ability to join in other records, something Lucene/Solr definitely can't do. 3. A reference application for browsing/searching the da

Re: solr-ruby with clustering

2010-03-22 Thread mike anderson
false alarm, on the client side I was specifically setting a shard, and this was causing my query/solr-ruby/solr to think it was a distributed request, which isn't supported by the clustering component. cheers, mike On Mon, Mar 22, 2010 at 8:53 PM, mike anderson wrote: > Has anybody got solr-rub

solr-ruby with clustering

2010-03-22 Thread mike anderson
Has anybody got solr-ruby to return a clustering result? (using the clustering component) I'm almost certain the query is correct (I check the solr logs for the query and run it in my browser, get back the cluster output as expected). But when I dump the response from my solr-ruby query the cluste

Re: synonyms problem

2010-03-22 Thread Lance Norskog
How large is the document, and how often does 'aberrant' appear in it? Are the other words also in the document? What is the full analysis stack? There might be interactions between the SynonymFilter and other filters. What does the admin/analysis.jsp page show? Does it throw OutOfMemory also? D

Re: Features not present in Solr

2010-03-22 Thread Lukáš Vlček
Hmm... sounds pretty much like what this book should be about (once finished): http://www.manning.com/ingersoll/ On Mon, Mar 22, 2010 at 8:46 PM, Lance Norskog wrote: > About Text Analysis: "Natural Language Processing" is the more usual > term. Finding parts of speech, isolating people's names,

Re: Features not present in Solr

2010-03-22 Thread Lance Norskog
About Text Analysis: "Natural Language Processing" is the more usual term. Finding parts of speech, isolating people's names, etc. On Mon, Mar 22, 2010 at 12:27 PM, Israel Ekpo wrote: > On Mon, Mar 22, 2010 at 3:16 PM, Lance Norskog wrote: > >> Web crawling. > > > I don't think Solr was designed

Re: Query interface

2010-03-22 Thread Lance Norskog
There are several response formats available for Solr: http://wiki.apache.org/solr/QueryResponseWriter Also, XSLT scripts and Velocity scripts are available for pre-processing output formats. On Mon, Mar 22, 2010 at 9:00 AM, Armando Ota wrote: > Hey ... > > Thank you very much .. been strugling

Re: DIH - Categories not indexed ????

2010-03-22 Thread Lance Norskog
Whoops, yes it is in the wiki. A link from the admin page would be welcome. On Mon, Mar 22, 2010 at 12:37 PM, Lance Norskog wrote: > There is a very cool debugger for the DataImportHandler: > > http://www.lucidimagination.com/search/document/CDRG_ch06_6.4.9?q=dataimport > debug jsp > > It is not

Re: DIH - Categories not indexed ????

2010-03-22 Thread Lance Norskog
There is a very cool debugger for the DataImportHandler: http://www.lucidimagination.com/search/document/CDRG_ch06_6.4.9?q=dataimport debug jsp It is not mentioned on the wiki, nor are there any links to it in the Solr admin console. On Mon, Mar 22, 2010 at 8:36 AM, stocki wrote: > > Helloo

Re: Features not present in Solr

2010-03-22 Thread Israel Ekpo
On Mon, Mar 22, 2010 at 3:16 PM, Lance Norskog wrote: > Web crawling. I don't think Solr was designed with Web Crawling in mind. Nutch would be more better suited for that, I believe. > Text analysis. > This is a bit vague. Please elaborate further. There is a lot of analysis (stemming, sto

Re: Features not present in Solr

2010-03-22 Thread Lukáš Vlček
On Mon, Mar 22, 2010 at 8:16 PM, Lance Norskog wrote: > Web crawling. > Nutch, Lucene Conectors Framework... would it help to include this directly into Solr code base? > Text analysis. > Under development I think, see Mahout (check some proposed GSoC tickets in JIRA) > Distributed index ma

Re: Question about query

2010-03-22 Thread Erick Erickson
One thing I've seen suggested is to add the number of values to a separate field, say topic_count. Then, in your situation above you could append "AND topic_count=1". This can extend to work if you wanted any number of matches (and only that number). For instance, topic=5 AND topic=10 AND topic=20

Re: Features not present in Solr

2010-03-22 Thread Lance Norskog
Web crawling. Text analysis. Distributed index management. A fanatical devotion to the Pope. On Sun, Mar 21, 2010 at 11:19 PM, MitchK wrote: > > Srikanth, > > I don't know anything about Endeca, so I can't compare Solr to it. > However, I know Solr is powerful. Very powerful. > So, maybe you shou

Re: SOLR-1316 How To Implement this autosuggest component ???

2010-03-22 Thread stocki
i patch an nightly build from solr. patch runs, classes are in the correct folder, but when i replace spellcheck with this spellchecl like in the comments, solr cannot find the classes =( suggest org.apache.solr.spelling.suggest.Suggester org.apache.solr.spelling.suggest.j

Re: Correct way to use tokenizer for whitespace

2010-03-22 Thread Ahmet Arslan
> Thank you. I tried that but it did > not work to remove trailing spaces. > I believe this is why my size facet queries are not > working. After > reloading, the XML result entries still have: > > > LARGE      > MEDIUM    > SMALL      > > > I am using this: > >     >     class="solr.Stand

Re: Correct way to use tokenizer for whitespace

2010-03-22 Thread Willie Whitehead
Thank you. I tried that but it did not work to remove trailing spaces. I believe this is why my size facet queries are not working. After reloading, the XML result entries still have: LARGE MEDIUM SMALL I am using this: And here is my size field: I did no

Re: Correct way to use tokenizer for whitespace

2010-03-22 Thread Ahmet Arslan
> In my schema.xml, I am trying to remove whitespace from a > multivalued > field as they come from the database. Is this the correct > way: > >     class="solr.TextField"> >       >         class="solr.StandardTokenizerFactory"/> >         class="solr.TrimFilterFactory" /> >       >     >

Correct way to use tokenizer for whitespace

2010-03-22 Thread Willie Whitehead
Hi, In my schema.xml, I am trying to remove whitespace from a multivalued field as they come from the database. Is this the correct way: I do not believe this is working. Thanks!

Re: Multi Select Facets through Java API

2010-03-22 Thread homerlex
With your eaxmple I got it working nicely with addFacetField and addFilterQuery in the API. Thanks, I appreciate the help. Britske wrote: > > something like this? > > q=mainquery&fq={!tag=carfq}cars:corvette OR > cars:camaro&facet=on&facet.field={!ex=carfq key=carfacet}cars > > -the facet:

Re: use termscomponent like spellComponent ?!

2010-03-22 Thread stocki
thx. it try to patch solr with 1316 but it not works =( do i need to checkout from svn Nightly ? http://svn.apache.org/repos/asf/lucene/solr/ when i create a patch and then create the WAR it has only 40 MB ... Grant Ingersoll-6 wrote: > > See https://issues.apache.org/jira/browse/SOLR-1

Re: Query interface

2010-03-22 Thread Armando Ota
Hey ... Thank you very much .. been strugling with this for hours now :( Will have to change the feature .. somehow :D Kind regards Armando Abdelhamid ABID wrote: Hi, I think there isn't better than using XSLT as a mean to query solr and render results. Within an xslt file you would combine

Re: Query interface

2010-03-22 Thread Abdelhamid ABID
Hi, I think there isn't better than using XSLT as a mean to query solr and render results. Within an xslt file you would combine search form with search results in one place, by this way you free the server from the heavy duty tasks of xslt transformation and let the client -which is in the most ca

Re: use termscomponent like spellComponent ?!

2010-03-22 Thread Grant Ingersoll
See https://issues.apache.org/jira/browse/SOLR-1316 On Mar 21, 2010, at 2:34 PM, stocki wrote: > > hello. > > i play with solr but i didn`t find the perfect solution for me. > > my goal is a search like the amazonsearch from the iPhoneApp. ;) > > it is possible to use the TermsComponent like

Re: Question about query

2010-03-22 Thread Armando Ota
Hey Thank you for your reply .. but it's not working ... I still get other articles Kind regards Armando Abdelhamid ABID wrote: Well, here what I figure out ! (mm=1<50% , qf=topic , q="1" "0" ) ==> q=topic:0 or topic:1 On 3/22/10, Armando Ota wrote: Hi I need a little help with quer

DIH - Categories not indexed ????

2010-03-22 Thread stocki
Helloo. i have the same database like in this example: http://wiki.apache.org/solr/DataImportHandler?highlight=(dih)#Full_Import_Example this is my data-config.xml

Re: Question about query

2010-03-22 Thread Abdelhamid ABID
Well, here what I figure out ! (mm=1<50% , qf=topic , q="1" "0" ) ==> q=topic:0 or topic:1 On 3/22/10, Armando Ota wrote: > > Hi > > I need a little help with query for my problem (if it can be solved) > > I have a field in a document called topic > > this field contains some values, 0 (for no

Re: synonyms problem

2010-03-22 Thread Armando Ota
Have you tried increasing memory size ? we had some out of memory problems when we used default memory size .. Kind regards Armando michaelnazaruk wrote: Hi all! I have a little problem with synonyms: when I set my synonyms.txt file such as: aberrant=>abnormal,unusual,deviant,anomalous,peculi

synonyms problem

2010-03-22 Thread michaelnazaruk
Hi all! I have a little problem with synonyms: when I set my synonyms.txt file such as: aberrant=>abnormal,unusual,deviant,anomalous,peculiar,uncharacteristic,irregular,atypical it's all right! But if I set this file such as aberrant,abnormal,unusual,deviant,anomalous,peculiar,uncharacteristic,irr

Re: Solr crashing while extracting from very simple text file

2010-03-22 Thread Ross
I thought you might ask that :-) It's because the pdf files are scanned from paper documents and OCR'd to produce text. They still contain the image so are huge. The smaller files are about 40 MB and cause a Java out of heap memory error. The larger files are getting close to 500 MB. I didn't have

Question about query

2010-03-22 Thread Armando Ota
Hi I need a little help with query for my problem (if it can be solved) I have a field in a document called topic this field contains some values, 0 (for no topic) or 1 (topic 1), 2, 3, etc ... It can contain many values like 1, 10, 50, etc (for 1 doc) So now to the problem: I would like t

Re: Query interface

2010-03-22 Thread Gora Mohanty
On Mon, 22 Mar 2010 15:26:41 +0100 Sebastian Funk wrote: > hey there, > > i've been using solr for some time now and set everything up the > way it's supposed to.. > now for the user interface: simply writing a javascript (or > something else) website that passes the query-URL to solr and > inte

Query interface

2010-03-22 Thread Sebastian Funk
hey there, i've been using solr for some time now and set everything up the way it's supposed to.. now for the user interface: simply writing a javascript (or something else) website that passes the query-URL to solr and interprets the XML given as a result. is that the easiest way? i've no

Re: Multi Select Facets through Java API

2010-03-22 Thread Geert-Jan Brits
something like this? q=mainquery&fq={!tag=carfq}cars:corvette OR cars:camaro&facet=on&facet.field={!ex=carfq key=carfacet}cars -the facet: "carfacet" is indepedennt of the filter query that filters on cars. -you construct the filter query (fq={!tag=carfq}cars:corvette OR cars:camaro) yourself in

Re: Solr crashing while extracting from very simple text file

2010-03-22 Thread Erik Hatcher
Why not feed the original PDF files in instead? Just curious if pdftotext is doing a better job than Tika's PDFBox stuff. Erik On Mar 22, 2010, at 9:30 AM, Ross wrote: Thanks Georg I don't think it's that because it crashes on a one word test file I create using the nano editor. I

Re: Solr crashing while extracting from very simple text file

2010-03-22 Thread Ross
Thanks Georg I don't think it's that because it crashes on a one word test file I create using the nano editor. I don't think nano is adding anything extra. My real files are created by a Windows utility called pdftotext. I solved the problem by getting pdftotext to generate html files rather tha

Re: Multi Select Facets through Java API

2010-03-22 Thread homerlex
bump - anyone? -- View this message in context: http://old.nabble.com/Multi-Select-Facets-through-Java-API-tp27951014p27986301.html Sent from the Solr - User mailing list archive at Nabble.com.

Solr crashing while extracting from very simple text file

2010-03-22 Thread György Frivolt
Hi, I had problem with indexing documents some months ago as well. I found that there were XML control characters in the documents and these were not handled by Solr. Maybe it is the case for you as well. Regards, Georg On Sun, Mar 21, 2010 at 5:58 PM, Ross wrote: > Hi all > > I'm tr

Re: MLT question

2010-03-22 Thread Marc Sturlese
> My question is how can I paginate the results of this query? For example > instead of setting rows you must specify mlt.count in the params. But how > can I set the offset? mlt.offset? As you do in a not mlt search request, setting start param should paginate your response results blargy wrote

Re: distributed solr and tf-idf

2010-03-22 Thread Koji Sekiguchi
Pooja Verlani wrote: Hi, How good is the distributed solr shards tf-idf (If at all its working with solr 1.4) ? Is there a chance of it getting better. I have to implement a huge index with many shards. How is it possible to get a global tf-idf for the same, any ideas? Regards, Pooja Distri

Index field untokenized

2010-03-22 Thread Alessandro Falasca (KCTP)
Hi All, I want to index some data untokenized (e.g. url), but I can't find a way to do it. I know there is a way to do it in solr configuration but I want to specify this options directly in my solr xml. This is a fragment of the xml that i post in slr and I want to know if is possible to add to

distributed solr and tf-idf

2010-03-22 Thread Pooja Verlani
Hi, How good is the distributed solr shards tf-idf (If at all its working with solr 1.4) ? Is there a chance of it getting better. I have to implement a huge index with many shards. How is it possible to get a global tf-idf for the same, any ideas? Regards, Pooja