Re: highlighting not working with Solr 3.0 trunk?

2011-01-14 Thread Chris Hostetter
: and ran ant there. I've followed the tutorial but : highlighting on analyzer debug screen isn't working. Yep, thanks for reporting this, i've opened a bug to track it... https://issues.apache.org/jira/browse/SOLR-2315 -Hoss

Re: Multi-word exact keyword case-insensitive search suggestions

2011-01-14 Thread Chamnap Chhorn
Ahh, thanks guys for helping me! For Adam solution, it doesn't work for me. Here is my Field, FieldType, and solr query: http://localhost:8081/solr/select?q=printing%20house&qf=keyphrase&debugQuery=on&defType=dismax +((DisjunctionMaxQuery((keyphrase:smart)) Di

Re: MaxRows and disabling sort

2011-01-14 Thread Chris Hostetter
: Also I guess default sorting is on Scoring and sorting can only be done once : it has the scores of all matches so then limiting it to the max rows becomes : useless. So if there a way to disable sorting? e.g. it returns the rows as : it finds without any order? http://wiki.apache.org/solr/Comm

Re: MaxRows and disabling sort

2011-01-14 Thread Salman Akram
In some cases my search takes too long. Now I want to show user partial matches if its taking too long. The problem with timeAllowed is that lets say I set its value to 10 secs then for some queries it would be fine and will at least return few hundred rows but in really worse scenarios it might n

Re: MaxRows and disabling sort

2011-01-14 Thread Erick Erickson
Why do you want to do this? That is, what problem do you think would be solved by this? Because there are other problems if you're trying to, say, return all rows that match But no, there's nothing that I know of that would do what you want (of course that doesn't mean there isn't). Best Eric

Re: solr speed issues..

2011-01-14 Thread Erick Erickson
You haven't given us much information here, it might help to review: http://wiki.apache.org/solr/UsingMailingLists In addition to Kenf_nc's comments, your sorting may be an issue, especially if you're measuring the first query times. What does debugQuery=on show? How many docs in your index? How

Re: DataImportHandler: full import of a single entity

2011-01-14 Thread Ahmet Arslan
> I've got a DataImportHandler set up > with 5 entities.  I would like to do a full > import on just one entity.  Is that possible? > Yes, there is a parameter named entity for that. solr/dataimport?command=full-import&entity=myEntity

Re: Multi-word exact keyword case-insensitive search suggestions

2011-01-14 Thread Erick Erickson
This might work: Define your field to use WhitespaceTokenizer and LowerCaseFilterFactory Use a filter query referencing this field. If you wanted the words to appear in their exact order, you could just define the "pf" field in your dismax. Best Erick On Thu, Jan 13, 2011 at 8:01 PM, Estrada G

MaxRows and disabling sort

2011-01-14 Thread Salman Akram
Hi, I want to limit my SOLR results so that it stops further searching once it founds a certain number of records (just like 'limit' in MySQL). I know it has timeAllowed property but is there anything like MaxRows? I am NOT talking about 'rows' attribute which returns a specific no. of rows to cl

DataImportHandler: full import of a single entity

2011-01-14 Thread Jon Drukman
I've got a DataImportHandler set up with 5 entities. I would like to do a full import on just one entity. Is that possible? I worked around it temporarily by hand editing the dataimport.properties file and deleting the delta line for that one entity, and kicking off a delta. But for (hopefully)

Re: segment gets corrupted (after background merge ?)

2011-01-14 Thread Michael McCandless
OK given that you're seeing non-deterministic results on the same index... I think this is likely a hardware issue or a JRE bug? If you move that index over to another env and run CheckIndex, is it consistent? Mike On Fri, Jan 14, 2011 at 9:00 AM, Stéphane Delprat wrote: > So I ran checkIndex (

No system property or default value specified for...

2011-01-14 Thread Tanner Postert
I'm trying to dynamically add a core to a multi core system using the following command: http://localhost:8983/solr/admin/cores?action=CREATE&name=items&instanceDir=items&config=data-config.xml&schema=schema.xml&dataDir=data&persist=true the data-config.xml looks like this: t

Re: Variable datasources

2011-01-14 Thread tjpoe
I was actually able to figure this out using a slightly different method since the databases exist on the same server I simply made a single datasource with no database selected: then in the queries, I qualify using the full database notation: database.table rather than just table

Re: boilerpipe solr tika howto please

2011-01-14 Thread Ken Krugler
Hi Arno, On Jan 14, 2011, at 3:57am, arnaud gaudinat wrote: Hello, I would like to use BoilerPipe (a very good program which cleans the html content from surplus "clutter"). I saw that BoilerPipe is inside Tika 0.8 and so should be accessible from solr, am I right? How I can Activate Boi

Re: boilerpipe solr tika howto please

2011-01-14 Thread Adam Estrada
There is another way to ingest data using DIH. Check out the HTMLStripTransformer http://www2c.cdc.gov/podcasts/createrss.asp?t=r&c=19"; processor="XPathEntityProcessor" forEach="/rss/channel | /rss/channel/item" transformer="DateFormatTransformer,HTMLStripTransformer

Re: Improving Solr performance

2011-01-14 Thread Gora Mohanty
On Fri, Jan 14, 2011 at 1:56 PM, supersoft wrote: > > The tests are performed with a selfmade program. [...] May I ask in what language is the program written in? The reason to ask that is to eliminate the possibility that there is an issue with the threading model, e.g., if you were using Python

Re: boilerpipe solr tika howto please

2011-01-14 Thread arnaud gaudinat
I just saw TagSoup and it seems to clean bad HTML tags to create a good HTML file. what's BoilerPipe does, it try to eliminate html content which is not part of the useful content for a human reader (ie. navigation contents, ads, comments...) take a look here: http://boilerpipe-web.appspot.com/

Re: boilerpipe solr tika howto please

2011-01-14 Thread Adam Estrada
Is there a drastic difference between this and TagSoup which is already included in Solr? On Fri, Jan 14, 2011 at 6:57 AM, arnaud gaudinat wrote: > Hello, > > I would like to use BoilerPipe (a very good program which cleans the html > content from surplus "clutter"). > I saw that BoilerPipe is in

Re: Tika Update, no Data

2011-01-14 Thread arnaud gaudinat
Le 14.01.2011 16:28, Jörg Agatz a écrit : If I well understood your problem try: so with stored="true" to get back the content Arnaud

Tika Update, no Data

2011-01-14 Thread Jörg Agatz
hey... i work with tika and Solr, at the Moment, i can index Dokument information but nur content.. to the details: part of my config: last_modified true text true ignored_ true links ignored_ Part of my Schema: curl command: curl " http://192.168.105.66:8983/solr

Re: Query : FAQ? Forum?

2011-01-14 Thread kenf_nc
http://wiki.apache.org/solr/FrontPage Solr Wiki http://wiki.apache.org/solr/FAQ Solr FAQ http://www.amazon.com/Solr-1-4-Enterprise-Search-Server/dp/1847195881/ref=sr_1_1?ie=UTF8&qid=1295018231&sr=8-1 A good book on Solr And this forum you posted to http://lucene.472066.n3.nabble.com/Solr-User

Re: solr speed issues..

2011-01-14 Thread kenf_nc
Can you reduce the number of docs and do paging (start=0,rows=50start=50,rows=50...etc)? That might help a little. This is also a factor of how much data each doc has, whether you have any fields that are compressed, etc. Also, make sure you have enough memory set aside for your cache (not on

Re: LukeRequestHandler histogram?

2011-01-14 Thread Bernd Fehling
Hi Stefan, thanks a lot. Regards, Bernd Am 14.01.2011 15:25, schrieb Stefan Matheis: > Hi Bernd, > > there is an explanation from Hoss: > http://search.lucidimagination.com/search/document/149e7d25415c0a36/some_kind_of_crazy_histogram#b22563120f1ec32b > > HTH > Stefan > > On Fri, Jan 14, 201

Re: LukeRequestHandler histogram?

2011-01-14 Thread Stefan Matheis
Hi Bernd, there is an explanation from Hoss: http://search.lucidimagination.com/search/document/149e7d25415c0a36/some_kind_of_crazy_histogram#b22563120f1ec32b HTH Stefan On Fri, Jan 14, 2011 at 3:15 PM, Bernd Fehling < bernd.fehl...@uni-bielefeld.de> wrote: > Dear list, > > what is the LukeRequ

LukeRequestHandler histogram?

2011-01-14 Thread Bernd Fehling
Dear list, what is the LukeRequestHandler histogram telling me? Couldn't find any explanation and would be pleased to have it explained. Many thanks in advance, Bernd

Re: segment gets corrupted (after background merge ?)

2011-01-14 Thread Stéphane Delprat
So I ran checkIndex (without -fix) 5 times in a row : SOLR was running, but no client connected to it. (just the slave which was synchronizing every 5 minutes) summary : 1: all good 2: 2 errors: (seg 1 & 2) terms, freq, prox...ERROR [term blog_id:104150: doc 324697 <= lastDoc 324697] & terms

Is deduplication possible during Tika extract?

2011-01-14 Thread arnaud gaudinat
Hello, here is an excerpt of my solrconfig.xml: class="org.apache.solr.handler.extraction.ExtractingRequestHandler" startup="lazy"> dedupe text true ignored_ true links ignored_ and class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory"> true signature false text

Re: Problem with Tika and ExtractingRequestHandler (How to from lucidimagination)

2011-01-14 Thread Jörg Agatz
no, i dont know that is the request Hadler: last_modified true text true ignored_ true links ignored_ and i start it like this: curl " http://192.168.105.66:8983/solr/update/extract?ext.idx.attr=true\&ext.def.fl=text

Re: Solr: using to index large folders recursively containing lots of different documents, and querying over the web

2011-01-14 Thread Markus Jelsma
Nutch can crawl the file system as well. Nutch 1.x can also provide search but this is delegated to Solr in Nutch 2.x. Solr can provide the search and Nutch can provide Solr with content from your intranet. On Friday 14 January 2011 13:17:52 Cathy Hemsley wrote: > Hi, > Thanks for suggesting thi

Re: Adding a new site to existing solr configuration

2011-01-14 Thread PeterKerk
Awesome! thx! :) -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-a-new-site-to-existing-solr-configuration-tp2249223p2255160.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr: using to index large folders recursively containing lots of different documents, and querying over the web

2011-01-14 Thread Toke Eskildsen
On Fri, 2011-01-14 at 13:05 +0100, Cathy Hemsley wrote: > I hope you can help. We are migrating our intranet web site management > system to Windows 2008 and need a replacement for Index Server to do the > text searching. I am trying to establish if Lucene and Solr is a feasible > replacement, bu

Re: Solr: using to index large folders recursively containing lots of different documents, and querying over the web

2011-01-14 Thread Markus Jelsma
Please visit the Nutch project. It is a powerful crawler and can integrate with Solr. http://nutch.apache.org/ > Hi Solr users, > > I hope you can help. We are migrating our intranet web site management > system to Windows 2008 and need a replacement for Index Server to do the > text searching

Re: Problem with Tika and ExtractingRequestHandler (How to from lucidimagination)

2011-01-14 Thread Stefan Matheis
pass an value for your id-field as you do it already for all the other fields? http://search.lucidimagination.com/search/document/ca95d06e700322ed/missing_required_field_id_using_extractingrequesthandler On Fri, Jan 14, 2011 at 12:59 PM, Jörg Agatz wrote: > ok, now in the 4 test, it works ? ok..

Solr: using to index large folders recursively containing lots of different documents, and querying over the web

2011-01-14 Thread Cathy Hemsley
Hi Solr users, I hope you can help. We are migrating our intranet web site management system to Windows 2008 and need a replacement for Index Server to do the text searching. I am trying to establish if Lucene and Solr is a feasible replacement, but I cannot find the answers to these questions:

Re: segment gets corrupted (after background merge ?)

2011-01-14 Thread Michael McCandless
Right, but removing a segment out from under a live IW (when you run CheckIndex with -fix) is deadly, because that other IW doesn't know you've removed the segment, and will later commit a new segment infos still referencing that segment. The nature of this particular exception from CheckIndex is

Re: Problem with Tika and ExtractingRequestHandler (How to from lucidimagination)

2011-01-14 Thread Jörg Agatz
ok, now in the 4 test, it works ? ok.. i dont know... it works.. but now i have a Oher Problem, i cant sent content to the Server.. when i will send Content to solr i get: Error 400 HTTP ERROR: 400Document [null] missing required field: id RequestURI=/solr/update/extracthttp://jetty.mortb

boilerpipe solr tika howto please

2011-01-14 Thread arnaud gaudinat
Hello, I would like to use BoilerPipe (a very good program which cleans the html content from surplus "clutter"). I saw that BoilerPipe is inside Tika 0.8 and so should be accessible from solr, am I right? How I can Activate BoilerPipe in Solr? Do I need to change solrconfig.xml ( with org.

Re: Query : FAQ? Forum?

2011-01-14 Thread Stefan Matheis
What about http://search.lucidimagination.com/search/#/p:solr ? :) On Fri, Jan 14, 2011 at 12:45 PM, Cathy Hemsley < cathy.hems...@converteam.com> wrote: > Hi, > > I am trying to get Solr installed and working: and have some queries: is > there a FAQ or a Forum? How do I search to see whether

Query : FAQ? Forum?

2011-01-14 Thread Cathy Hemsley
Hi, I am trying to get Solr installed and working: and have some queries: is there a FAQ or a Forum? How do I search to see whether someone has already asked my question and answered it? Regards Cathy -- Converteam UK Ltd. Registration Number: 5571739 and Converteam Ltd. Registration Number

solr speed issues..

2011-01-14 Thread saureen
I am working on an application that requires fetching results from solr based on date parameter..earlier i was using sharding to fetch the results but that was making things too slow,so instead of sharding,i queried on three different cores with the same parameters and merged the results..still th

Problem with Tika and ExtractingRequestHandler (How to from lucidimagination)

2011-01-14 Thread Jörg Agatz
Hallo, I will indexig fulltext Documents, so i read, that Tika is a god idea :-) so i try the How to from lucidimagination ( http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika ) first of all, i install Maven2, and mvn Tika, i have test Tika in shell

Re: facet result issue

2011-01-14 Thread dhanesh
Hi... Wow..very nice..:-) :-) It worked.. Thanks a lot. On 1/14/2011 4:10 PM, david.dankwe...@ubs.com wrote: Try facet.mincount=1 , this will ensure you get only values bigger than 0 So example : Serach to be q=district:A&facet.field=city&facet.mincount=1 Or via the Java api SolrQuery solrQuery

RE: facet result issue

2011-01-14 Thread david.dankwerth
Try facet.mincount=1 , this will ensure you get only values bigger than 0 So example : Serach to be q=district:A&facet.field=city&facet.mincount=1 Or via the Java api SolrQuery solrQuery = new SolrQuery("district:A").addFacetField("city").setFacetMinCount(1) -Original Message- From:

facet result issue

2011-01-14 Thread dhanesh
Hi, I've a trouble in facet search.Here is the scenario I've 2 districts say A and B Under the district A, there are two cities say, x and y and under district B, there is another two cities, say j and k When I try to search lists under the district A, search result works fine. Facet also worked.

Re: Improving Solr performance

2011-01-14 Thread Toke Eskildsen
On Thu, 2011-01-13 at 17:40 +0100, supersoft wrote: > Although most of the queries are cache hits, the performance is still > dependent of the number of simultaneous queries: > > 1 simultaneous query: 3437 ms (cache fails) Average response time: 3437 ms Throughput: 0.29 queries/sec > 2 simultane

Re: Schema design FAQs/questions

2011-01-14 Thread Stefan Matheis
Matthias, will try to give you a few answers :) It's completely okay, when various documents don't use all fields .. that's how solr works. For Sorting related to missing Values, you might Search for sortMissingLast / sortMissingFirst . The second paragraph sounds like http://wiki.apache.org/sol

Schema design FAQs/questions

2011-01-14 Thread Matthias Pigulla
Dear Solr-users, is there a compilation of FAQs particularly targeting at schema design? I have a two questions that probably have been asked before: - I have to map different kinds of documents into my schema. Some of these documents have one or multiple time/dates that might be relevant for q

Re: Solr 4.0 => Spatial Search - How to

2011-01-14 Thread Stefan Matheis
absolutely no idea why it is a blob .. but the following one works as expected: CAST( CONCAT( lat, ',', lng ) AS CHAR ) HTH Stefan On Fri, Jan 14, 2011 at 9:31 AM, caman wrote: > > > CONCAT(CAST(lat as CHAR),',',CAST(lng as CHAR)) > -- > View this message in context: > http://lucene.472066.n3.n

Re: Searchers and Warmups

2011-01-14 Thread Tommaso Teofili
Hi David, The idea is that you can define some "listeners" which make a list of queries to an IndexSearcher. In particular the firstSearcher event is related to the very first IndexSearcher being created inside the Solr instance while the newSearcher is the event related to the creation of a new In

Re: Searchers and Warmups

2011-01-14 Thread Savvas-Andreas Moysidis
Hi David, maybe the wiki page on caching could be helpful: http://wiki.apache.org/solr/SolrCaching#newSearcher_and_firstSearcher_Event_Listeners Regards, - Savvas On 14 January 2011 00:08, David Cramer wrote

Solr and Ping PHP

2011-01-14 Thread stockii
Hello. Iam using NRT and for each search-request, updater-request and commit-request (on the search-instance) i start a ping to solr with a httpRequest. But sometimes ping isnt okay, but sor is available. Why cannot solr ping, when he is doing something like Commit on my searcher or when a sear

Re: Dismax, Sharding and Elevation

2011-01-14 Thread Oliver Marahrens
Hi, thank you for your reply, Grijesh. But Elevation in general works with sharding - if I used the Standard Request Handler instead of Dismax. I just wonder how (or if) it could work also with dismax. I think its not a problem of distributed search, but one of dismax (perhaps combined with distri

Re: Solr 4.0 => Spatial Search - How to

2011-01-14 Thread caman
CONCAT(CAST(lat as CHAR),',',CAST(lng as CHAR)) -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Spatial-Search-How-to-tp2245592p2254151.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.0 => Spatial Search - How to

2011-01-14 Thread Stefan Matheis
caman, how did you try to concat them? perhaps some typecasting would do the trick? Stefan On Fri, Jan 14, 2011 at 7:20 AM, caman wrote: > > Thanks > Here was the issues. Concatenating 2 floats(lat,lng) at mysql end converted > it to a BLOB. Indexing would fail in storing BLOB in 'location' typ

Re: Improving Solr performance

2011-01-14 Thread supersoft
The tests are performed with a selfmade program. The arguments are the number of threads and the path to a file which contains available queries (in the last test only one). When each thread is created, it gets the current date (in milisecs), and when it gets the response from the query, the threa

Re: DataimportHandler development issue

2011-01-14 Thread Gora Mohanty
On Fri, Jan 14, 2011 at 12:17 AM, Derek Werthmuller wrote: > Its not clear why its not working.  Advice? > Also is this the best way to load data?  We intent on loading several > thousand docbook documents once we understand how this all works.  We stuck > with the rss/atom example since we didn'