date:20100618

Re: Autocompletion with Solritas

2010-06-18 Thread Chantal Ackermann

Hi,

here is my solution. It has been some time since I last looked at it,
but it works fine. :-)








$(function() {
$("#qterm").autocomplete('/solr/epg/suggest', {
extraParams: {
'terms.prefix': function() { return $("#qterm").val(); }
},
hightlight: false,
max: 30,
formatItem: function(row, i, n) {
return row;
},
parse: function(data) {
var json =  jQuery.secureEvalJSON(data);
var terms = json.terms;
var suggMap = terms[1];
var suggest = [];
var j = 0;
for (i=0; i

explicit
true
false
json
suggestsrc


terms



suggestsrc is of type solr.TextField, accumulated from different source
fields.

Cheers,
Chantal

SolrQuery and escaping special characters

2010-06-18 Thread Paolo Castagna


Hi,
I am using Solr v1.4 and SolrJ on the client side.

I am not sure how SolrJ behaves regarding "escaping" special characters
[1] in a query string.

SolrJ does URL encoding of the query string it sends to Solr.

Do I need to escape special characters [1] when I construct a SolrQuery
object or not?

For example, if I want to search for "http://example.com#foo"; in a
"uri" field, should I use:

 (a)  SolrQuery query = new SolrQuery("uri:http://example.com#foo";);
 (b)  SolrQuery query = new SolrQuery("uri:http\\://example.com#foo");

which become respectively:

 (a') q=uri%3Ahttp%3A%2F%2Fexample.com%23foo
 (b') q=uri%3Ahttp%5C%3A%2F%2Fexample.com%23foo

My understanding is that SolrJ users are supposed to escape special 
characters, therefore (b) is the correct way.


If this is the case, what's the best way to escape a query string which
might contain field names and URIs in their field values?

Thanks,
Paolo

 [1] 
http://lucene.apache.org/java/2_9_1/queryparsersyntax.html#Escaping%20Special%20Characters

Re: SolrQuery and escaping special characters

2010-06-18 Thread Ahmet Arslan

> My understanding is that SolrJ users are supposed to escape
> special characters, therefore (b) is the correct way.
> If this is the case, what's the best way to escape a query
> string which
> might contain field names and URIs in their field values?

Easiest thing is to use RawQParserPlugin or FieldQParserPlugin.

e.g. SolrQuery.setQuery("{!field f=uri}http://www.apache.org";)

custom scoring phrase queries

2010-06-18 Thread Marco Martinez

Hi,

I want to know if its posiible to get a higher score in a phrase query when
the matching is on the left side of the field. For example:


doc1=name:stores peter john
doc2=name:peter john stores
doc3=name:peter john something

if you do a search with name="peter john" the resultset i want to get is:

doc2
doc3
doc1

because the terms peter john are on the left side of the field and they get
a higher score.

Thanks in advance,


Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42

Re: Autocompletion with Solritas

2010-06-18 Thread Erik Hatcher


Yup, that's basically what I've done too, here's the script part:
  


I didn't touch the example solrconfig, though putting the params in  
the request handler is the better way, as you have.


Erik


On Jun 18, 2010, at 3:32 AM, Chantal Ackermann wrote:


Hi,

here is my solution. It has been some time since I last looked at it,
but it works fine. :-)

src="/solr/epg/admin/file?file=/velocity/ 
jquery-1.4.min.js&contentType=text/javascript">

src="/solr/epg/admin/file?file=/velocity/jquery- 
ui.js&contentType=text/javascript">

src="/solr/epg/admin/file?file=/velocity/ 
jquery.autocomplete.js&contentType=text/javascript">

href="/solr/epg/admin/file?file=/velocity/ 
jquery.autocomplete.css&contentType=text/css"/>


src="/solr/epg/admin/file?file=/velocity/ 
jquery.json-2.2.min.js&contentType=text/javascript">


$(function() {
$("#qterm").autocomplete('/solr/epg/suggest', {
extraParams: {
'terms.prefix': function() { return $("#qterm").val(); }
},
hightlight: false,
max: 30,
formatItem: function(row, i, n) {
return row;
},
parse: function(data) {
var json =  jQuery.secureEvalJSON(data);
var terms = json.terms;
var suggMap = terms[1];
var suggest = [];
var j = 0;
for (i=0; i

explicit
true
false
json
suggestsrc


terms



suggestsrc is of type solr.TextField, accumulated from different  
source

fields.

Cheers,
Chantal

How to LOCK the index-dir for changes from the IndexWriter 2010-06-18 Thread Alexander Rothenberg Hi, im writing a RequestHandler that manages backups of the index-directory. Yea, i know theres already the Replication-RequestHandler that is also capable of creating backups, but i want the backupprocess to do some more action and not to depend on index-commit-points as the ReplicationHandler does. Reason is, we are running a custom DataImportHandler that starts a full-import with the option clean=false every 30 secs (That means, its doing changes to the current index every 30 secs). Also the importer-process never sends a commit (bacuse that made the whole server stuck for a few secs). So, i simply want to lock the index-directory with the same mechanism that is used when the index gets optimized. I took a look to DirectUpdateHandler2 / IndexWriter etc to find out what kind of lock is used but was not successful in implementing it to my RequestHandler :( The big advantage of locking the index-dir would be that i dont have to stop the index-updating process. the import would automatically hold on until the lock is removed and the backup got finished. Regards, Alex -- Alexander Rothenberg Fotofinder GmbH USt-IdNr. DE812854514 Software EntwicklungWeb: http://www.fotofinder.net/ Potsdamer Str. 96 Tel: +49 30 25792890 10785 BerlinFax: +49 30 257928999 Geschäftsführer:Ali Paczensky Amtsgericht:Berlin Charlottenburg (HRB 73099) Sitz: Berlin Nested table support ability 2010-06-18 Thread amit_ak I want to use Solr for free text and parameter based search. I need help to validate if Solr can help me in achieving following requirement. Say I have two database tables having one to many relationships. 1. Customer - Customer Id, Customer Name, Profile 2. Role - Role Type, Start date, End Date, Customer Id(Foreign key to Customer table) Start and End date of the role are will decide effectiveness of the role. Logic for role effectiveness: Current Date > Start Date and Current Date <= End Date Say I have following records in tables -- Customer Table Customer Id Columns NameProfile 1 David Some text : : -- Role Table Customer IdRole Type Start date End Date 1 ADMIN 01/01/2000 01/01/2001 1 OPERATOR01/01/2009 01/01/2010 : : If my search criteria is, Get me all customers playing role of ADMIN as of current date. Expected result - No customer records should be returned as current date(06/18/2010) is greater than "End Date" for ADMIN role record. I want to create instance of “Document” per customer record and want to have nested table relations embedded in the document. Please help me understand if such a requirement is possible to achieve in Solr. Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Nested-table-support-ability-tp905228p905228.html Sent from the Solr - User mailing list archive at Nabble.com. finding out why a document is in the result 2010-06-18 Thread Lukas Kahwe Smith Hi, We want to tell the user why a document is in the result set. The first solution that came to mind here is using highlighting. However we do not really need to present the highlighted text, we just want to present the user a list of fields where we had matches (for example "address, hobbies, skills"). Furthermore with highlighting we are forced to set the given fields to stored, which doesnt make sense especially for one very long text field. Is there some way I am missing to get just the list of fields per document in the result in which the query matched without having to set the given fields to stored? regards, Lukas Kahwe Smith m...@pooteeweet.org RE: finding out why a document is in the result 2010-06-18 Thread Fornoville, Tom Hi Lukas, Have you tried setting the debug mode (debugQuery=on)? It provides very detailed info about the scoring, it might even be too much for a regular user but for us it was very helpful at times. Regards, Tom -Original Message- From: Lukas Kahwe Smith [mailto:m...@pooteeweet.org] Sent: vrijdag 18 juni 2010 11:57 To: solr-user@lucene.apache.org Subject: finding out why a document is in the result Hi, We want to tell the user why a document is in the result set. The first solution that came to mind here is using highlighting. However we do not really need to present the highlighted text, we just want to present the user a list of fields where we had matches (for example "address, hobbies, skills"). Furthermore with highlighting we are forced to set the given fields to stored, which doesnt make sense especially for one very long text field. Is there some way I am missing to get just the list of fields per document in the result in which the query matched without having to set the given fields to stored? regards, Lukas Kahwe Smith m...@pooteeweet.org Re: finding out why a document is in the result 2010-06-18 Thread Lukas Kahwe Smith On 18.06.2010, at 12:00, Fornoville, Tom wrote: > Hi Lukas, > > Have you tried setting the debug mode (debugQuery=on)? > It provides very detailed info about the scoring, it might even be too > much for a regular user but for us it was very helpful at times. yeah .. that was the second thing i looked at. it doesnt really contain the infos required, plus its obviously quite slow too. regards, Lukas Kahwe Smith m...@pooteeweet.org Nested table support ability 2010-06-18 Thread amit_ak I want to use Solr for free text and parameter based search. I need help to validate if Solr can help me in achieving following requirement. Say I have two database tables having one to many relationships. 1. Customer - Customer Id, Customer Name, Profile 2. Role - Role Type, Start date, End Date, Customer Id(Foreign key to Customer table) Start and End date of the role are will decide effectiveness of the role. Logic for role effectiveness: Current Date > Start Date and Current Date <= End Date Say I have following records in tables -- Customer Table Customer IdName Profile 1 David Some text : : -- Role Table Customer IdRole Type Start date End Date 1 ADMIN01/01/2000 01/01/2001 1 OPERATOR01/01/2009 01/01/2010 : : If my search criteria is, Get me all customers playing role of ADMIN as of current date. Expected result - No customer records should be returned as current date(06/18/2010) is greater than "End Date" for ADMIN role record. I want to create instance of “Document” per customer record and want to have nested table relations embedded in the document. Please help me understand if such a requirement is possible to achieve in Solr. Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Nested-table-support-ability-tp905253p905253.html Sent from the Solr - User mailing list archive at Nabble.com. Re: Autocompletion with Solritas 2010-06-18 Thread Erik Hatcher Looks like a typo below, Chantal, and another comment below too... On Jun 18, 2010, at 3:32 AM, Chantal Ackermann wrote: $(function() { $("#qterm").autocomplete('/solr/epg/suggest', { extraParams: { 'terms.prefix': function() { return $("#qterm").val(); } }, hightlight: false, hightlight? highlight :) parse: function(data) { var json = jQuery.secureEvalJSON(data); var terms = json.terms; var suggMap = terms[1]; var suggest = []; var j = 0; for (i=0; i This is one of the beauties of the VelocityResponseWriter, freeing the client from having to deal with a Solr data structure. In my work, I did this: This makes a terms component request like return exactly what the suggest component likes natively, suggestions textually one per line: ipod in Erik Re: federated / meta search 2010-06-18 Thread Sascha Szott Hi Joe & Markus, sounds good! Maybe I should better add a note on the Wiki page on federated search [1]. Thanks, Sascha [1] http://wiki.apache.org/solr/FederatedSearch Joe Calderon wrote: yes, you can use distributed search across shards with different schemas as long as the query only references overlapping fields, i usually test adding new fields or tokenizers on one shard and deploy only after i verified its working properly On Thu, Jun 17, 2010 at 1:10 PM, Markus Jelsma wrote: Hi, Check out Solr sharding [1] capabilities. I never tested it with different schema's but if each node is queried with fields that it supports, it should return useful results. [1]: http://wiki.apache.org/solr/DistributedSearch Cheers. -Original message- From: Sascha Szott Sent: Thu 17-06-2010 19:44 To: solr-user@lucene.apache.org; Subject: federated / meta search Hi folks, if I'm seeing it right Solr currently does not provide any support for federated / meta searching. Therefore, I'd like to know if anyone has already put efforts into this direction? Moreover, is federated / meta search considered a scenario Solr should be able to deal with at all or is it (far) beyond the scope of Solr? To be more precise, I'll give you a short explanation of my requirements. Assume, there are a couple of Solr instances running at different places. The documents stored within those instances are all from the same domain (bibliographic records), but it can not be ensured that the schema definitions conform to 100%. But lets say, there are at least some index fields that are present in all instances (fields with the same name and type definition). Now, I'd like to perform a search on all instances at the same time (with the restriction that the query contains only those fields that overlap among the different schemas) and combine the results in a reasonable way by utilizing the score information associated with each hit. Please note, that due to legal issues it is not feasible to build a single index that integrates the documents of all Solr instances under consideration. Thanks in advance, Sascha Re: Field Collapsing SOLR-236 2010-06-18 Thread Rakhi Khatwani Hi Moazzam, Where did u get the src code from?? I am downloading it from https://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4 and the latest revision in this location is 955469. so applying the latest patch(dated 17th june 2010) on it still generates errors. Any Pointers? Regards, Raakhi On Fri, Jun 18, 2010 at 1:24 AM, Moazzam Khan wrote: > I knew it wasn't me! :) > > I found the patch just before I read this and applied it to the trunk > and it works! > > Thanks Mark and martijn for all your help! > > - Moazzam > > On Thu, Jun 17, 2010 at 2:16 PM, Martijn v Groningen > wrote: > > I've added a new patch to the issue, so building the trunk (rev > > 955615) with the latest patch should not be a problem. Due to recent > > changes in the Lucene trunk the patch was not compatible. > > > > On 17 June 2010 20:20, Erik Hatcher wrote: > >> > >> On Jun 16, 2010, at 7:31 PM, Mark Diggory wrote: > >>> > >>> p.s. I'd be glad to contribute our Maven build re-organization back to > the > >>> community to get Solr properly Mavenized so that it can be distributed > and > >>> released more often. For us the benefit of this structure is that we > will > >>> be able to overlay addons such as RequestHandlers and other third party > >>> support without having to rebuild Solr from scratch. > >> > >> But you don't have to rebuild Solr from scratch to add a new request > handler > >> or other plugins - simply compile your custom stuff into a JAR and put > it in > >> /lib (or point to it with in solrconfig.xml). > >> > >>> Ideally, a Maven Archetype could be created that would allow one > rapidly > >>> produce a Solr webapp and fire it up in Jetty in mere seconds. > >> > >> How's that any different than cd example; java -jar start.jar? Or do > you > >> mean a Solr client webapp? > >> > >>> Finally, with projects such as Bobo, integration with Spring would make > >>> configuration more consistent and request significantly less java > coding > >>> just to add new capabilities everytime someone authors a new > RequestHandler. > >> > >> It's one line of config to add a new request handler. How many > ridiculously > >> ugly confusing lines of Spring XML would it take? > >> > >>> The biggest thing I learned about Solr in my work thusfar is that > patches > >>> like these could be standalone modules in separate projects if it > weren't > >>> for having to hack the configuration and solrj methods up to adopt > them. > >>> Which brings me to SolrJ, great API if it would stay generic and have > less > >>> concern for adding method each time some custom collections and query > >>> support for morelikethis or collapseddocs needs to be added. > >> > >> I personally find it silly that we customize SolrJ for all these request > >> handlers anyway. You get a decent navigable data structure back from > >> general SolrJ query requests as it is, there's no need to build in all > these > >> convenience methods specific to all the Solr componetry. Sure, it's > >> "convenient", but it's a maintenance headache and as you say, not > generic. > >> > >> But hacking configuration is reasonable, I think, for adding in plugins. > I > >> guess you're aiming for some kind of Spring-like auto-discovery of > plugins? > >> Yeah, maybe, but I'm pretty -1 on Spring coming into Solr. It's > overkill > >> and ugly, IMO. But you like it :) And that's cool by me, to each their > >> own. > >> > >> Oh, and Hi Mark! :) > >> > >>Erik > >> > >> > > > > > > > > -- > > Met vriendelijke groet, > > > > Martijn van Groningen > > > Re: Field Collapsing SOLR-236 2010-06-18 Thread Martijn v Groningen Hi Rakhi, The patch is not compatible with 1.4. If you want to work with the trunk. I'll need to get the src from https://svn.apache.org/repos/asf/lucene/dev/trunk/ Martijn On 18 June 2010 13:46, Rakhi Khatwani wrote: > Hi Moazzam, > > Where did u get the src code from?? > > I am downloading it from > https://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4 > > and the latest revision in this location is 955469. > > so applying the latest patch(dated 17th june 2010) on it still generates > errors. > > Any Pointers? > > Regards, > Raakhi > > > On Fri, Jun 18, 2010 at 1:24 AM, Moazzam Khan wrote: > >> I knew it wasn't me! :) >> >> I found the patch just before I read this and applied it to the trunk >> and it works! >> >> Thanks Mark and martijn for all your help! >> >> - Moazzam >> >> On Thu, Jun 17, 2010 at 2:16 PM, Martijn v Groningen >> wrote: >> > I've added a new patch to the issue, so building the trunk (rev >> > 955615) with the latest patch should not be a problem. Due to recent >> > changes in the Lucene trunk the patch was not compatible. >> > >> > On 17 June 2010 20:20, Erik Hatcher wrote: >> >> >> >> On Jun 16, 2010, at 7:31 PM, Mark Diggory wrote: >> >>> >> >>> p.s. I'd be glad to contribute our Maven build re-organization back to >> the >> >>> community to get Solr properly Mavenized so that it can be distributed >> and >> >>> released more often. For us the benefit of this structure is that we >> will >> >>> be able to overlay addons such as RequestHandlers and other third party >> >>> support without having to rebuild Solr from scratch. >> >> >> >> But you don't have to rebuild Solr from scratch to add a new request >> handler >> >> or other plugins - simply compile your custom stuff into a JAR and put >> it in >> >> /lib (or point to it with in solrconfig.xml). >> >> >> >>> Ideally, a Maven Archetype could be created that would allow one >> rapidly >> >>> produce a Solr webapp and fire it up in Jetty in mere seconds. >> >> >> >> How's that any different than cd example; java -jar start.jar? Or do >> you >> >> mean a Solr client webapp? >> >> >> >>> Finally, with projects such as Bobo, integration with Spring would make >> >>> configuration more consistent and request significantly less java >> coding >> >>> just to add new capabilities everytime someone authors a new >> RequestHandler. >> >> >> >> It's one line of config to add a new request handler. How many >> ridiculously >> >> ugly confusing lines of Spring XML would it take? >> >> >> >>> The biggest thing I learned about Solr in my work thusfar is that >> patches >> >>> like these could be standalone modules in separate projects if it >> weren't >> >>> for having to hack the configuration and solrj methods up to adopt >> them. >> >>> Which brings me to SolrJ, great API if it would stay generic and have >> less >> >>> concern for adding method each time some custom collections and query >> >>> support for morelikethis or collapseddocs needs to be added. >> >> >> >> I personally find it silly that we customize SolrJ for all these request >> >> handlers anyway. You get a decent navigable data structure back from >> >> general SolrJ query requests as it is, there's no need to build in all >> these >> >> convenience methods specific to all the Solr componetry. Sure, it's >> >> "convenient", but it's a maintenance headache and as you say, not >> generic. >> >> >> >> But hacking configuration is reasonable, I think, for adding in plugins. >> I >> >> guess you're aiming for some kind of Spring-like auto-discovery of >> plugins? >> >> Yeah, maybe, but I'm pretty -1 on Spring coming into Solr. It's >> overkill >> >> and ugly, IMO. But you like it :) And that's cool by me, to each their >> >> own. >> >> >> >> Oh, and Hi Mark! :) >> >> >> >> Erik >> >> >> >> >> > >> > >> > >> > -- >> > Met vriendelijke groet, >> > >> > Martijn van Groningen >> > >> > Data Import Handler Rich Format Documents 2010-06-18 Thread Tod I have a database containing Metadata from a content management system. Part of that data includes a URL pointing to the actual published document which can be an HTML file or a PDF, MS Word/Excel/Powerpoint, etc. I'm already indexing the Metadata and that provides a lot of value. The customer however would like that the content pointed to by the URL also be indexed for more discrete searching. This article at Lucid: http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Searching-rich-format-documents-stored-DBMS describes the process of coding a custom transformer. A separate article I've read implies Nutch could be used to provide this functionality too. What would be the best and most efficient way to accomplish what I'm trying to do? I have a feeling the Lucid article might be dated and there might ways to do this now without any coding and maybe without even needing to use Nutch. I'm using the current release version of Solr. Thanks in advance. - Tod Re: How to LOCK the index-dir for changes from the IndexWriter 2010-06-18 Thread Otis Gospodnetic Alex, For something like that you may just want to directly use one of the Lucene lock classes to create a lock: http://search-lucene.com/?q=lock&fc_project=Lucene&fc_type=source+code e.g. http://search-lucene.com/c/Lucene:/src/java/org/apache/lucene/store/SingleInstanceLockFactory.java||makeLock Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Alexander Rothenberg > To: solr-user@lucene.apache.org > Sent: Fri, June 18, 2010 5:28:53 AM > Subject: How to LOCK the index-dir for changes from the IndexWriter > > Hi, im writing a RequestHandler that manages backups of the index-directory. > Yea, i know theres already the Replication-RequestHandler that is also > capable of creating backups, but i want the backupprocess to do some more > action and not to depend on index-commit-points as the ReplicationHandler > does. Reason is, we are running a custom DataImportHandler that starts a > full-import with the option clean=false every 30 secs (That means, its doing > changes to the current index every 30 secs). Also the importer-process never > sends a commit (bacuse that made the whole server stuck for a few secs). > So, i simply want to lock the index-directory with the same mechanism > that is used when the index gets optimized. I took a look to > DirectUpdateHandler2 / IndexWriter etc to find out what kind of lock is used > but was not successful in implementing it to my RequestHandler :( The > big advantage of locking the index-dir would be that i dont have to stop the > index-updating process. the import would automatically hold on until the > lock is removed and the backup got finished. Regards, > Alex -- Alexander Rothenberg Fotofinder GmbH > USt-IdNr. DE812854514 Software > EntwicklungWeb: > target=_blank >http://www.fotofinder.net/ Potsdamer Str. > 96Tel: +49 30 25792890 10785 Berlin > Fax: +49 30 > 257928999 Geschäftsführer:Ali > Paczensky Amtsgericht:Berlin > Charlottenburg (HRB 73099) Sitz: > Berlin Re: custom scoring phrase queries 2010-06-18 Thread Otis Gospodnetic Marco, I don't think there is anything in Solr to do that (is there?), but you could do it with some coding if you combined the "regular query" with SpanFirstQuery with bigger boost: http://search-lucene.com/jd/lucene/org/apache/lucene/search/spans/SpanFirstQuery.html Oh, here are some examples and at the bottom you will see exactly what I suggested above: http://search-lucene.com/c/Lucene:/src/java/org/apache/lucene/search/spans/package.html||SpanFirstQuery Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Marco Martinez > To: solr-user@lucene.apache.org > Sent: Fri, June 18, 2010 4:34:45 AM > Subject: custom scoring phrase queries > > Hi, I want to know if its posiible to get a higher score in a phrase > query when the matching is on the left side of the field. For > example: doc1=name:stores peter john doc2=name:peter john > stores doc3=name:peter john something if you do a search with > name="peter john" the resultset i want to get > is: doc2 doc3 doc1 because the terms peter john are on the > left side of the field and they get a higher score. Thanks in > advance, Marco Martínez Bautista > href="http://www.paradigmatecnologico.com"; target=_blank > >http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª > Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 How to open/update/delete remote index ? 2010-06-18 Thread abhay kumar Hi, I am working with solr in production which is configured on remote server . I need to delete some documents from solr index. I know this can be done by curl by calling solr "update" request handler. But i'm looking for GUI tool. I tried luke but luke doesn't open remote index. Do we have any tool which can open/delete/update remote index ? A quick reply will be appreciated. Regards, Abhay Re: custom scoring phrase queries 2010-06-18 Thread Marco Martinez Hi Otis, Finally i construct my own function query that gives more score if the value is at the start of the field. But, its possible to tell solr to use spanFirstQuery without coding. I think i have read that its no possible. Thanks, Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 2010/6/18 Otis Gospodnetic > Marco, > > I don't think there is anything in Solr to do that (is there?), but you > could do it with some coding if you combined the "regular query" with > SpanFirstQuery with bigger boost: > > > http://search-lucene.com/jd/lucene/org/apache/lucene/search/spans/SpanFirstQuery.html > > Oh, here are some examples and at the bottom you will see exactly what I > suggested above: > > > http://search-lucene.com/c/Lucene:/src/java/org/apache/lucene/search/spans/package.html||SpanFirstQuery > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > - Original Message > > From: Marco Martinez > > To: solr-user@lucene.apache.org > > Sent: Fri, June 18, 2010 4:34:45 AM > > Subject: custom scoring phrase queries > > > > Hi, > > I want to know if its posiible to get a higher score in a phrase > > query when > the matching is on the left side of the field. For > > example: > > > doc1=name:stores peter john > doc2=name:peter john > > stores > doc3=name:peter john something > > if you do a search with > > name="peter john" the resultset i want to get > > is: > > doc2 > doc3 > doc1 > > because the terms peter john are on the > > left side of the field and they get > a higher score. > > Thanks in > > advance, > > > Marco Martínez Bautista > > > href="http://www.paradigmatecnologico.com"; target=_blank > > >http://www.paradigmatecnologico.com > Avenida de Europa, 26. Ática 5. 3ª > > Planta > 28224 Pozuelo de Alarcón > Tel.: 91 352 59 42 > Re: How to open/update/delete remote index ? 2010-06-18 Thread Otis Gospodnetic Hi, I don't think there is a GUI for this, other than the Web browser. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: abhay kumar > To: solr-user@lucene.apache.org > Sent: Fri, June 18, 2010 9:09:43 AM > Subject: How to open/update/delete remote index ? > > Hi, I am working with solr in production which is configured on remote > server . I need to delete some documents from solr index. I know > this can be done by curl by calling solr "update" request handler. But i'm > looking for GUI tool. I tried luke but luke doesn't open remote > index. Do we have any tool which can open/delete/update remote index > ? A quick reply will be appreciated. Regards, Abhay Re: Data Import Handler Rich Format Documents 2010-06-18 Thread Otis Gospodnetic Tod, You didn't mention Tika, which makes me think you are not aware of it... You could implement a custom Transformer that uses Tika to perform rich doc text extraction, just like ExtractingRequestHandler does it (see http://wiki.apache.org/solr/ExtractingRequestHandler ). Maybe you could even just call ERH from your Transformer, though that wouldn't be the most efficient. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Tod > To: solr-user@lucene.apache.org > Sent: Fri, June 18, 2010 8:51:02 AM > Subject: Data Import Handler Rich Format Documents > > I have a database containing Metadata from a content management system. > Part of that data includes a URL pointing to the actual published document > which > can be an HTML file or a PDF, MS Word/Excel/Powerpoint, etc. I'm already > indexing the Metadata and that provides a lot of value. The customer > however would like that the content pointed to by the URL also be indexed for > more discrete searching. This article at Lucid: > href="http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Searching-rich-format-documents-stored-DBMS"; > > target=_blank > >http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Searching-rich-format-documents-stored-DBMS describes > the process of coding a custom transformer. A separate article I've read > implies Nutch could be used to provide this functionality too. What would > be the best and most efficient way to accomplish what I'm trying to do? I > have a feeling the Lucid article might be dated and there might ways to do > this > now without any coding and maybe without even needing to use Nutch. I'm > using the current release version of Solr. Thanks in > advance. - Tod Re: Nested table support ability 2010-06-18 Thread Otis Gospodnetic Hello, The short answer is that you need to flatten everything. Your index then has some column-db-like redundancy, but queries become simple and flat. But: See http://blog.sematext.com/2010/06/02/lucene-digest-may-2010-3/ and https://issues.apache.org/jira/browse/LUCENE-2454 in particular. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: amit_ak > To: solr-user@lucene.apache.org > Sent: Fri, June 18, 2010 6:02:36 AM > Subject: Nested table support ability > > I want to use Solr for free text and parameter based search. I need help > to validate if Solr can help me in achieving following > requirement. Say I have two database tables having one to many > relationships. 1. Customer - Customer Id, Customer Name, Profile 2. Role - > Role Type, Start date, End Date, Customer Id(Foreign key to Customer > table) Start and End date of the role are will decide effectiveness of > the role. Logic for role effectiveness: Current Date > Start Date and > Current Date <= End Date Say I have following records in > tables -- Customer Table Customer Id > Name Profile 1 > David Some > text : : -- Role Table Customer > IdRole Type Start date >End Date 1 > ADMIN > 01/01/200001/01/2001 > 1 >OPERATOR01/01/2009 > 01/01/2010 : : If my search criteria is, Get me all > customers playing role of ADMIN as of current date. Expected result - > No customer records should be returned as current date(06/18/2010) is greater > than "End Date" for ADMIN role record. I want to create instance > of “Document” per customer record and want to have nested table relations > embedded in the document. Please help me understand if such a > requirement is possible to achieve in Solr. Thanks in > advance. -- View this message in context: > href="http://lucene.472066.n3.nabble.com/Nested-table-support-ability-tp905253p905253.html"; > > target=_blank > >http://lucene.472066.n3.nabble.com/Nested-table-support-ability-tp905253p905253.html Sent > from the Solr - User mailing list archive at Nabble.com. Re: How to open/update/delete remote index ? 2010-06-18 Thread Erik Hatcher What kind of GUI are you looking for here? It'd be easy to hack a "delete this hit" link into the /browse view that now resides on trunk Solr, for example. But I hesitate to add that in at the risk of someone deleting things inadvertently, but perhaps an "admin" mode would be the way to build it in. Erik On Jun 18, 2010, at 9:09 AM, abhay kumar wrote: Hi, I am working with solr in production which is configured on remote server . I need to delete some documents from solr index. I know this can be done by curl by calling solr "update" request handler. But i'm looking for GUI tool. I tried luke but luke doesn't open remote index. Do we have any tool which can open/delete/update remote index ? A quick reply will be appreciated. Regards, Abhay Comma delemitered words shawn in terms like one word. 2010-06-18 Thread Vitaliy Avdeev Hello. In indexing text I have such string John,Mark,Sam. Then I looks at it in TermVectorComponent it looks like this johnmarksam. I am using this type for storing data What filter I need to use to get John Mark Sam as different words? MappingCharFilterFactory equivalent for use after tokenizer? 2010-06-18 Thread Jan Høydahl / Cominvent Hi, Is there a token filter which do the same job as MappingCharFilterFactory but after tokenizer, reading the same config file? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com Re: Data Import Handler Rich Format Documents 2010-06-18 Thread Tod On 6/18/2010 9:12 AM, Otis Gospodnetic wrote: Tod, You didn't mention Tika, which makes me think you are not aware of it... You could implement a custom Transformer that uses Tika to perform rich doc text extraction, just like ExtractingRequestHandler does it (see http://wiki.apache.org/solr/ExtractingRequestHandler ). Maybe you could even just call ERH from your Transformer, though that wouldn't be the most efficient. You're right, sorry. I have looked at Tika, which I believe is used by Nutch too - no? Implementing a transformer is fine. I guess I'm being lazy and trying to see if a method of doing this has been incorporated into the latest Solr release so I can avoid coding for it. - Original Message From: Tod To: solr-user@lucene.apache.org Sent: Fri, June 18, 2010 8:51:02 AM Subject: Data Import Handler Rich Format Documents I have a database containing Metadata from a content management system. Part of that data includes a URL pointing to the actual published document which can be an HTML file or a PDF, MS Word/Excel/Powerpoint, etc. I'm already indexing the Metadata and that provides a lot of value. The customer however would like that the content pointed to by the URL also be indexed for more discrete searching. This article at Lucid: href="http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Searching-rich-format-documents-stored-DBMS"; target=_blank http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Searching-rich-format-documents-stored-DBMS describes the process of coding a custom transformer. A separate article I've read implies Nutch could be used to provide this functionality too. What would be the best and most efficient way to accomplish what I'm trying to do? I have a feeling the Lucid article might be dated and there might ways to do this now without any coding and maybe without even needing to use Nutch. I'm using the current release version of Solr. Thanks in advance. - Tod Re: OOM on sorting on dynamic fields 2010-06-18 Thread Matteo Fiandesio Hello, we are experiencing OOM exceptions in our single core solr instance (on a (huge) amazon EC2 machine). We investigated a lot in the mailing list and through jmap/jhat dump analyzing and the problem resides in the lucene FieldCache that fills the heap and blows up the server. Our index is quite small but we have a lot of sort queries on fields that are dynamic,of type long representing timestamps and are not present in all the documents. Those queries apply sorting on 12-15 of those fields. We are using solr 1.4 in production and the dump shows a lot of Integer/Character and Byte Array filled up with 0s. With solr's trunk code things does not change. In the mailing list we saw a lot of messages related to this issues: we tried truncating the dates to day precision,using missingSortLast = true,changing the field type from slong to long,setting autowarming to different values,disabling and enabling caches with different values but we did not manage to solve the problem. We were thinking to implement an LRUFieldCache field type to manage the FieldCache as an LRU and preventing but, before starting a new development, we want to be sure that we are not doing anything wrong in the solr configuration or in the index generation. Any help would be appreciated. Regards, Matteo Re: Peformance tuning 2010-06-18 Thread Blargy Otis Gospodnetic-2 wrote: > > Smaller merge factor will make things worse - > - Whoops... Ill guess Ill change it from 5 to the default 10 -- View this message in context: http://lucene.472066.n3.nabble.com/Peformance-tuning-tp904540p905726.html Sent from the Solr - User mailing list archive at Nabble.com. Re: Peformance tuning 2010-06-18 Thread Blargy Otis Gospodnetic-2 wrote: > > You may want to try the RPM tool, it will show you what inside of that > QueryComponent is really slow. > We are already using it :) Where should I be concentrating on? Transaction trace? -- View this message in context: http://lucene.472066.n3.nabble.com/Peformance-tuning-tp904540p905730.html Sent from the Solr - User mailing list archive at Nabble.com. customize the search algorithm of solr 2010-06-18 Thread sarfaraz masood Are there any means by which we can customize the search of solr, by plugins etc ?? i have been working on a research based project to implement a new search algorithm for search engines.I wanna know if i can make solr use this algorithm to decide the resultant documents, and still allow me to use all the rest of the features of solr. Re: Autocompletion with Solritas 2010-06-18 Thread Chantal Ackermann Hi Erik, thanks so much for your feedback! > > hightlight? highlight :) ups... Seems that this parameter is false by default, though. At least it never complained. *g* > This is one of the beauties of the VelocityResponseWriter, freeing the > client from having to deal with a Solr data structure. In my work, I > did this: > > > > > > > This makes a terms component request like > > > return exactly what the suggest component likes natively, > suggestions textually one per line: > ipod > in > > Erik I am beginning to understand what you mean. I haven't had a look at your suggest template, so far. But I definitely will. Thanks again! Chantal Re: Data Import Handler Rich Format Documents 2010-06-18 Thread Otis Gospodnetic Tod, I don't think DIH can do that, but who knows, let's see what others say. Yes, Nutch uses TIKA, too. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Tod > To: solr-user@lucene.apache.org > Sent: Fri, June 18, 2010 10:20:34 AM > Subject: Re: Data Import Handler Rich Format Documents > > On 6/18/2010 9:12 AM, Otis Gospodnetic wrote: > Tod, > > You > didn't mention Tika, which makes me think you are not aware of it... > You > could implement a custom Transformer that uses Tika to perform rich doc text > extraction, just like ExtractingRequestHandler does it (see > href="http://wiki.apache.org/solr/ExtractingRequestHandler"; target=_blank > >http://wiki.apache.org/solr/ExtractingRequestHandler ). Maybe you > could even just call ERH from your Transformer, though that wouldn't be the > most > efficient. You're right, sorry. I have looked at Tika, which I > believe is used by Nutch too - no? Implementing a transformer is > fine. I guess I'm being lazy and trying to see if a method of doing this > has been incorporated into the latest Solr release so I can avoid coding for > it. > > > - Original Message > >> From: Tod < > href="mailto:listac...@gmail.com";>listac...@gmail.com> >> To: > ymailto="mailto:solr-user@lucene.apache.org"; > href="mailto:solr-user@lucene.apache.org";>solr-user@lucene.apache.org >> > Sent: Fri, June 18, 2010 8:51:02 AM >> Subject: Data Import Handler > Rich Format Documents >> >> I have a database containing > Metadata from a content management system. Part of that data includes a > URL pointing to the actual published document which can be an HTML file or a > PDF, MS Word/Excel/Powerpoint, etc. > > I'm already >> > indexing the Metadata and that provides a lot of value. The customer > however would like that the content pointed to by the URL also be indexed for > more discrete searching. > > This article at Lucid: > > > >> href=" > href="http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Searching-rich-format-documents-stored-DBMS"; > > target=_blank > >http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Searching-rich-format-documents-stored-DBMS"; > > > target=_blank >>> > href="http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Searching-rich-format-documents-stored-DBMS"; > > target=_blank > >http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Searching-rich-format-documents-stored-DBMS > > > describes >> the process of coding a custom > transformer. A separate article I've read implies Nutch could be used to > provide this functionality too. > > What would >> be the > best and most efficient way to accomplish what I'm trying to do? I have a > feeling the Lucid article might be dated and there might ways to do this now > without any coding and maybe without even needing to use Nutch. I'm using > the current release version of Solr. > > Thanks in >> > advance. > > > - Tod > Re: MappingCharFilterFactory equivalent for use after tokenizer? 2010-06-18 Thread Ahmet Arslan > Is there a token filter which do the same job as > MappingCharFilterFactory but after tokenizer, reading the > same config file? No, closest thing can be PatternReplaceFilterFactory. http://lucene.apache.org/solr/api/org/apache/solr/analysis/PatternReplaceFilterFactory.html Re: How to LOCK the index-dir for changes from the IndexWriter 2010-06-18 Thread Alexander Rothenberg Thx for the reply. I tried those lock-methods already but still cant it get to work. Heres what i did in my RequestHandler at least. Type "Directory" and all Lock stuff comes from org.apache.lucene.store.* and the copyFiles method is the same as for the replication-RequestHandler. I still never see a lucene-xxx-write.lock file inside the index-dir as i see one when the index gets optimzed... have a good WE, regards Alex /** * @param req * @param rsp */ private void createBackup(SolrQueryRequest req, SolrQueryResponse rsp){ String message = new String(); String snapDir; if (backupLocation != null){ snapDir = backupLocation + "/" + solrCore.getName(); } else { snapDir = solrCore.getDataDir(); } try{ // backup dir SimpleDateFormat fmt = new SimpleDateFormat(FFBackupConfigDefaults.DATE_FMT); String directoryName = "snapshot." + fmt.format(new Date()); // name darf nicht geändert werden message += "directoryName: " + directoryName + "; \n"; // indexdir open DirectoryFactory dirFac = solrCore.getDirectoryFactory(); String indexDirPath = solrCore.getIndexDir(); Directory indexDir = dirFac.open(indexDirPath); // solr-core verzeichnis in der backup-location erstmal checken und erstellen falls nötig: if (!new File(snapDir).exists()){ new File(snapDir).mkdir(); } File snapShotDir = new File(snapDir, directoryName); if (snapShotDir.mkdir()) { // LOCK String rawLockType = (null == idxConfig) ? null : idxConfig.lockType; if (null == rawLockType) { // we default to "simple" for backwards compatibility LOG.warn("No lockType configured for " + indexDirPath + " assuming 'simple'"); rawLockType = "simple"; } final String lockType = rawLockType.toLowerCase().trim(); if (lockType.equals("simple")) { // multiple SimpleFSLockFactory instances should be OK indexDir.setLockFactory(new SimpleFSLockFactory(indexDirPath)); } else if (lockType.equals("native")) { indexDir.setLockFactory(new NativeFSLockFactory(indexDirPath)); } else if (lockType.equals("single")) { if (!(indexDir.getLockFactory() instanceof SingleInstanceLockFactory)){ indexDir.setLockFactory(new SingleInstanceLockFactory()); } } else if (lockType.equals("none")) { // Recipe for disaster LOG.error("CONFIGURATION WARNING: locks are disabled on " + indexDirPath); indexDir.setLockFactory(new NoLockFactory()); } else { throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Unrecognized lockType: " + rawLockType); } indexDir.makeLock(org.apache.lucene.index.IndexWriter.WRITE_LOCK_NAME); List files2copy = Arrays.asList(indexDir.listAll()); message += "files2copy: " + files2copy + "; \n"; copyFiles(files2copy, snapShotDir); indexDir.clearLock(org.apache.lucene.index.IndexWriter.WRITE_LOCK_NAME); message += "done\n"; rsp.add("result", message); } else { message += "Kann Snapshotverzeichnis nicht erstellen: " + snapShotDir.getAbsolutePath() + "\n"; LOG.error(message); } LOG.info(message); } catch (IOException e) { message += "Error on creating backup: " + e + "\n"; LOG.error(message, e); Re: Comma delemitered words shawn in terms like one word. 2010-06-18 Thread Joe Calderon set generateWordParts=1 on wordDelimiter or use PatternTokenizerFactory to split on commas http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternTokenizerFactory you can use the analysis page to see what your filter chains are going to do before you index /admin/analysis.jsp On Fri, Jun 18, 2010 at 6:41 AM, Vitaliy Avdeev wrote: > Hello. > In indexing text I have such string John,Mark,Sam. Then I looks at it in > TermVectorComponent it looks like this johnmarksam. > > I am using this type for storing data > > positionIncrementGap="100" > > > > ignoreCase="true" expand="false"/> > words="stopwords.txt"/> > generateWordParts="0" generateNumberParts="0" catenateWords="1" > catenateNumbers="1" catenateAll="0"/> > > > > > > What filter I need to use to get John Mark Sam as different words? > performance sorting multivalued field 2010-06-18 Thread Marc Sturlese hey there! can someone explain me how impacts to have multivalued fields when sorting? I have read in other threads how does it affect when faceting but couldn't find any info of the impact when sorting Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/performance-sorting-multivalued-field-tp905943p905943.html Sent from the Solr - User mailing list archive at Nabble.com. Re: dismax and AND as the default operator 2010-06-18 Thread Jan Høydahl / Cominvent Standard DisMax does not fully support explicit AND/OR. You can prove that by trying to say q=fuel+OR+cell and see that the score stays the same (given mm=100%) It appears that DisMax does SOME intelligent handling of AND/OR/NOT, because it adds the "+" on the AND and a "-" on the NOT. But adding a "+" is redundant and does not change anything as long as mm=100%. The NOT actually seems to work, but the OR does not have any effect due to the "+" on the top-level (). If you need boolean syntax support in DisMax, try the defType=edismax with patch SOLR-1553 or alternatively on branch_3x -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 18. juni 2010, at 02.44, Erik Hatcher wrote: > Hmmm, maybe I'm wrong and it does support AND. Looking at the code I don't > see why it wouldn't, actually. Though I believe I've seen it documented that > it isn't supported (or at least not advertised to support). Ok, from the > dismax wiki page it says: "This query handler supports an extremely > simplified subset of the Lucene QueryParser syntax. Quotes can be used to > group phrases, and +/- can be used to denote mandatory and optional clauses". > Only special single characters are escaped. So AND/OR must work. Learn > something new every day! > > Erik > > > > On Jun 17, 2010, at 8:28 PM, Tommy Chheng wrote: > >> Thanks, Erik. that does work. I misunderstood the documentation, i thought >> "clause" meant "field" rather than the terms in the query. >> >> If dismax doesn't support the operator AND, why would the query >> "solr/select?q=fuel+cell" and "solr/select?q=fuel+AND+cell" get parsed >> differently(it adds the + for the AND query) and have different result count? >> >> @tommychheng >> Programmer and UC Irvine Graduate Student >> Find a great grad school based on research interests: >> http://gradschoolnow.com >> >> >> On 6/17/10 5:17 PM, Erik Hatcher wrote: >>> dismax does not support the operator AND. It uses +/- only. >>> >>> set mm=100% (not 1), as Hoss said, and try your query again. >>> >>> Erik >>> >>> On Jun 17, 2010, at 8:08 PM, Tommy Chheng wrote: >>> I don't think setting the mm helps. I have mm to 1 which means the query terms should be in at least one field. Both query strings satisfy this condition. The query "solr/select?q=fuel+cell" is parsed as "querystring":"fuel cell", "parsedquery":"+((DisjunctionMaxQuery((text:fuel | organization_name_ws_lc:fuel^5.0)) DisjunctionMaxQuery((text:cell | organization_name_ws_lc:cell^5.0)))~1) ()", "parsedquery_toString":"+(((text:fuel | organization_name_ws_lc:fuel^5.0) (text:cell | organization_name_ws_lc:cell^5.0))~1) ()", returns ~900 results The query "solr/select?q=fuel+AND+cell" is parsed as "querystring":"fuel AND cell", "parsedquery":"+(+DisjunctionMaxQuery((text:fuel | organization_name_ws_lc:fuel^5.0)) +DisjunctionMaxQuery((text:cell | organization_name_ws_lc:cell^5.0))) ()", "parsedquery_toString":"+(+(text:fuel | organization_name_ws_lc:fuel^5.0) +(text:cell | organization_name_ws_lc:cell^5.0)) ()", returns ~80 results (this is the behavior i want for query "fuel cell" because it adds the extra +). I want to do this without adding the AND for every query. @tommychheng Programmer and UC Irvine Graduate Student Find a great grad school based on research interests: http://gradschoolnow.com On 6/17/10 4:19 PM, Chris Hostetter wrote: > : I'm using the dismax request handler and want to set the default > operator to > : AND. > : Using the standard handler, i could just use the q.op or > defaultOperator in > : the schema, but this doesn't work using the dismax request handler. > : > : For example, if I call "solr/select/?q=fuel+cell", I want solr to > handle it as > : a "solr/select/?q=fuel+AND+cell" > > Please consult the dismax docs... > http://wiki.apache.org/solr/DisMaxRequestHandler#mm_.28Minimum_.27Should.27_Match.29 > > dismax uses the "mm" param to decide how clauses that don't have an > explicit operator will be dealt with -- the default is to require 100% of > the terms, so if you aren't seeing that behavior then you have a > solrconfig.xml that that sets the default mm value to something else. > > Starting with Solr 4.0 (and mybe 3.1 if it's backported) the default mm > will be based on the value of q.op (see SOLR-1889 for more details) > > > -Hoss > >>> > Re: ranking question 2010-06-18 Thread Jan Høydahl / Cominvent Consider upgrading to the 3.1 branch which gives you true sort by function http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 18. juni 2010, at 01.23, Chris Hostetter wrote: > > : I want to reorder the results as per function like > : sum(w0*score, w1*field1, w2*field2, w3*filed3,..) > : > : I am using solr1.4 and it seems it does not support sort by function. > : > : How can this be achieved > : > : I tried using > : q=(query)^w0 (_val_:field1)^w1 (_val_:field2...)^w2 > > try fq=(query)&q={!func}sum(...) > > ...if you can't express the entire query as a pure function, and need to > resort to a BooleanQuery consisting of many individual function queries > (like in your example) then consider writing a custom Similarity class > that eliminates the querynorm. > > > -Hoss > Re: finding out why a document is in the result 2010-06-18 Thread Jan Høydahl / Cominvent Are you wanting to do thin on every single user query, and present to the end user which words matched where? In that case debugQuery may be too much, and I would look into creating a custom debugComponent optimized to only outputting the core parts of the "explain" section that you need. If this is some support department admin tool, I would recommend implementing a hidden query parameter in your front-end, which turns on debugQuery=true and pulls out the "explain" part, parses it and outputs a simple list of matching terms in each field inline - for admin users only. What info are you missing from debugQuery? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 18. juni 2010, at 12.02, Lukas Kahwe Smith wrote: > > On 18.06.2010, at 12:00, Fornoville, Tom wrote: > >> Hi Lukas, >> >> Have you tried setting the debug mode (debugQuery=on)? >> It provides very detailed info about the scoring, it might even be too >> much for a regular user but for us it was very helpful at times. > > > yeah .. that was the second thing i looked at. it doesnt really contain the > infos required, plus its obviously quite slow too. > > regards, > Lukas Kahwe Smith > m...@pooteeweet.org > > > Re: performance sorting multivalued field 2010-06-18 Thread Erik Hatcher do you mean sorting facets? or sorting search results? you can't sort search results by a multivalued field - which value would it use? Erik On Jun 18, 2010, at 12:45 PM, Marc Sturlese wrote: hey there! can someone explain me how impacts to have multivalued fields when sorting? I have read in other threads how does it affect when faceting but couldn't find any info of the impact when sorting Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/performance-sorting-multivalued-field-tp905943p905943.html Sent from the Solr - User mailing list archive at Nabble.com. solr indexing takes a long time and is not reponsive to abort command 2010-06-18 Thread Ya-Wen Hsu Hi, I have multi-core solr setup. All cores finished indexing in reasonable time but one. I look at the dataimport info for the one that's hanging. The process is still in busy state but no requests made or rows fetched. The database side just showed the process is waiting for future command and is doing nothing. The attempt to abort the process doesn't really work. Does anyone know what's happening here? Thanks! Wen RE: performance sorting multivalued field 2010-06-18 Thread Ya-Wen Hsu Hi, I have sort on multivalued field with field collapse plugin. Solr always use the first value it gets from the search result when sorting multivalued fileds. I might be wrong but I vaguely remember it's the smallest value. Wen -Original Message- From: Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent: Friday, June 18, 2010 10:32 AM To: solr-user@lucene.apache.org Subject: Re: performance sorting multivalued field do you mean sorting facets? or sorting search results? you can't sort search results by a multivalued field - which value would it use? Erik On Jun 18, 2010, at 12:45 PM, Marc Sturlese wrote: > > hey there! > can someone explain me how impacts to have multivalued fields when > sorting? > I have read in other threads how does it affect when faceting but > couldn't > find any info of the impact when sorting > Thanks in advance > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/performance-sorting-multivalued-field-tp905943p905943.html > Sent from the Solr - User mailing list archive at Nabble.com. Re: how to index the words of a lecture transcript, and the timecodes for each word? 2010-06-18 Thread Peter Wilkins After some more research, it seems that I might be able to use payloads to store the timecodes with the words, though this would appear to require some custom java code. I found this post useful: http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/ (thanks, Grant!) On Jun 16, 2010, at 9:50 PM, Peter Wilkins pwilk...@mit.edu wrote: > I have lecture transcripts with start and stop times for each word. The time > codes allow us to search the transcripts, and show the part of the lecture > video that contain the search results. I want to structure the index so that > I can search the transcripts for phrases, and have the search results contain > a portion of the transcript containing the query terms, as well as metadata > identifying the transcript, video, and time codes that will allow me to > position the video player at the correct point for playback. > > Here's what the raw input data looks like (time codes are in milliseconds): > > ... > 6183 6288 in > 6288 6868 physics > 7186 7342 we > 7342 8013 explore > 9091 9181 the > 9181 9461 very > 9461 9956 small > 10741 10862 to > 10862 10946 the > 10946 11226 very > 11226 11686 large > .. > > > Can someone offer some guidance as to how I can structure the upload data to > perform this magic? I want to believe that someone with more Solr/Lucene > knowledge than I can see their way through this problem. > > thank you, > Peter Re: performance sorting multivalued field 2010-06-18 Thread Marc Sturlese I mean sorting the query results, not facets. I am asking because I have added a multivalued field that has as much 10 values. But 70% of the docs has just 1 or 2 fields of this multiValued field. I am not doing faceting. Since I have added the multiValued field, "java old gen" seems to get full more quick and GC are happening more often. I don't see why multiValued can use more memory querying by normal relevance. That's why I think maybe it's sort queries fault... Any explanation or advice? Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/performance-sorting-multivalued-field-tp905943p906115.html Sent from the Solr - User mailing list archive at Nabble.com. Re: Data Import Handler Rich Format Documents 2010-06-18 Thread Tod On 6/18/2010 11:24 AM, Otis Gospodnetic wrote: Tod, I don't think DIH can do that, but who knows, let's see what others say. Yes, Nutch uses TIKA, too. Otis Looks like the ExtractingRequestHandler uses Tika as well. I might just use this but I'm wondering if there will be a large performance difference between using it to batch content in over rolling my own Transformer? - Tod Re: Data Import Handler Rich Format Documents 2010-06-18 Thread Chris Hostetter : > I don't think DIH can do that, but who knows, let's see what others say. : Looks like the ExtractingRequestHandler uses Tika as well. I might just use : this but I'm wondering if there will be a large performance difference between : using it to batch content in over rolling my own Transformer? I'm confused ... You're using DIH, and some of your fields are URLs to documents that you want to parse with Tika? Why would you need a custom Transformer? http://wiki.apache.org/solr/DataImportHandler#Tika_Integration http://wiki.apache.org/solr/TikaEntityProcessor -Hoss Re: Autocompletion with Solritas 2010-06-18 Thread Ken Krugler On Jun 17, 2010, at 8:34pm, Erik Hatcher wrote: Your wish is my command. Check out trunk, fire up Solr (ant run- example), index example data, hit http://localhost:8983/solr/browse - type in search box. That works - excellent! Now I'm trying to build a distribution from trunk that I can use for prototyping, and noticed a few things... 1. From a fresh check-out, you can't build from the trunk/solr sub-dir due to dependencies on Lucene classes. Once you've done a top-level "ant compile" then you can cd into /solr and do ant builds. 2. I noticed the run-example target in trunk/solr/build.xml doesn't have a description, so it doesn't show up with ant -p. 3. I tried "ant create-package" from trunk/solr, and got this error near the end: /Users/kenkrugler/svn/lucene/lucene-trunk/solr/common-build.xml: 252: /Users/kenkrugler/svn/lucene/lucene-trunk/solr/contrib/velocity/ src not found. I don't see contrib/velocity anywhere in the Lucene trunk tree. What's the recommended way to build a Solr distribution from trunk? In the meantime I'll just use example/start.jar with solr.solr.home and solr.data.dir system properties. Thanks, -- Ken Just used jQuery's autocomplete plugin and the terms component for now, on the name field. Quite simple to plug in, actually. Check the commit diff. The main magic is doing this: Stupidly, though, jQuery's autocomplete seems to be hardcoded to send a q parameter, but I coded it to also send the same value as terms.prefix - but this could be an issue if hitting a different request handler where q is used for the actual query for filtering terms on. Cool?! I think so! :) Erik On Jun 17, 2010, at 8:03 PM, Ken Krugler wrote: I don't believe Solritas supports autocompletion out of the box. So I'm wondering if anybody has experience using the LucidWorks distro & Solritas, plus the AJAX Solr auto-complete widget. I realize that AJAX Solr's autocomplete support is mostly just leveraging the jQuery Autocomplete plugin, and hooking it up to Solr facets, but I was curious if there were any tricks or traps in getting it all to work. Thanks, -- Ken +1 530-265-2225 Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g Re: SolrQuery and escaping special characters 2010-06-18 Thread Chris Hostetter : I am not sure how SolrJ behaves regarding "escaping" special characters : [1] in a query string. SolrJ encodes your strings for "transport" -- ie: it handles URL escaping if it's sending hte query in a GET URL -- but id doesn't do "query parser escaping" ... mainly because it has no way of knowing which query parser you are using. -Hoss Re: Autocompletion with Solritas 2010-06-18 Thread Erik Hatcher On Jun 18, 2010, at 2:56 PM, Ken Krugler wrote: Your wish is my command. Check out trunk, fire up Solr (ant run- example), index example data, hit http://localhost:8983/solr/browse - type in search box. That works - excellent! Now I'm trying to build a distribution from trunk that I can use for prototyping, and noticed a few things... 1. From a fresh check-out, you can't build from the trunk/solr sub- dir due to dependencies on Lucene classes. Once you've done a top- level "ant compile" then you can cd into /solr and do ant builds. sigh. Hopefully we'll shake these things out better over time. This is just one of the growing pains of the lucene/solr merge. 2. I noticed the run-example target in trunk/solr/build.xml doesn't have a description, so it doesn't show up with ant -p. Fixed. It was/is at least documented in the usage (just type ant). 3. I tried "ant create-package" from trunk/solr, and got this error near the end: /Users/kenkrugler/svn/lucene/lucene-trunk/solr/common-build.xml: 252: /Users/kenkrugler/svn/lucene/lucene-trunk/solr/contrib/ velocity/src not found. I don't see contrib/velocity anywhere in the Lucene trunk tree. Ok, I'm looking into this and will clean it up. Darn you maven! What's the recommended way to build a Solr distribution from trunk? Good question. Anyone...? :) You've done it as I have, sorry for the too brief instructions earlier - I have run "ant compile" from the top, but didn't realize it was necessary first. Erik Re: MappingCharFilterFactory equivalent for use after tokenizer? 2010-06-18 Thread Jan Høydahl / Cominvent It would be nice to have, because sometimes you want to normalize accents and other characters but want to wait until other filters have run. Especially if those filters are dictionary based and therefore need the original word form. Do you have a clue of how different a CharFilter is from a normal token Filter - perhaps it is a quick port? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 18. juni 2010, at 18.38, Ahmet Arslan wrote: >> Is there a token filter which do the same job as >> MappingCharFilterFactory but after tokenizer, reading the >> same config file? > > No, closest thing can be PatternReplaceFilterFactory. > > http://lucene.apache.org/solr/api/org/apache/solr/analysis/PatternReplaceFilterFactory.html > > > Re: Data Import Handler Rich Format Documents 2010-06-18 Thread Sixten Otto On Fri, Jun 18, 2010 at 2:42 PM, Chris Hostetter wrote: > I'm confused ... You're using DIH, and some of your fields are URLs to > documents that you want to parse with Tika? > > Why would you need a custom Transformer? Yeah, I can definitely vouch that DIH can handle this without additional coding. (The Lucid article the OP linked to looks like it's defining a custom Transformer because the document is in a BLOB in the database.) However, the DIH in Solr 1.4 doesn't have the Tika support you'd need. You would need to go with either trunk or branch_3x to make this work. Sixten Bizarre TFV output 2010-06-18 Thread Darren Govoni Hi, I am using a recent nightly build of Solr with no significant schema mods. I index a couple documents and view the TFV's in this query. Re: Bizarre TFV output 2010-06-18 Thread Darren Govoni darn evolution... Anyway, I am using a recent nightly build of Solr with no significant schema mods. I index a couple documents and view the TFV's in this query. http://localhost:8080/solr4/select/?q=search&start=0&rows=10&indent=on&qt=tvrh&tv.tf=true&tv=true&fl=text_t&tv.docids=ALL30002 It shows some unwanted and possibly erroneous terms. 5 1 2 2 2 1 5 And some improper stemming (e.g.requir? require.) This seems buggy to me. Are these correct? If so, how can I sort out the legit terms from these messy ones? thanks for any tips! Darren On Fri, 2010-06-18 at 15:33 -0400, Darren Govoni wrote: > Hi, > I am using a recent nightly build of Solr with no significant schema > mods. I index a couple documents and view the TFV's in this query. > > > Re: Data Import Handler Rich Format Documents 2010-06-18 Thread Tod On 6/18/2010 2:42 PM, Chris Hostetter wrote: : > I don't think DIH can do that, but who knows, let's see what others say. : Looks like the ExtractingRequestHandler uses Tika as well. I might just use : this but I'm wondering if there will be a large performance difference between : using it to batch content in over rolling my own Transformer? I'm confused ... You're using DIH, and some of your fields are URLs to documents that you want to parse with Tika? Why would you need a custom Transformer? I started this thread after reading a Lucid article suggesting a custom Transformer might be the way to go when using DIH. My initial question was if there was an alternative. My database contains only Metadata and a reference to the actual content (HTML,Office Documents, PDF...) as a URL - not blobs as the Lucid article focused on. What I would like to do is use DIH somehow to index the Metadata but also the actual content pointed to by the URL column. I might be able to do this instead with Nutch (who uses Tika), haven't thoroughly researched this yet, or I can write a job to pull all the URL's out of the database and utilize cURL and the Solr ExtractingRequestHandler to push everything into the index. I just wanted to see what everybody else is doing and what my other options might be. Thanks - Tod Ref: http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Searching-rich-format-documents-stored-DBMS Re: MailEntityProcessor class cast exception 2010-06-18 Thread Chris Hostetter : org.apache.solr.handler.dataimport.MailEntityProcessor cannot be cast to : org.apache.solr.handler.dataimport.EntityProcessor ... : I did try to rebuild the solr nightly, but I still receive the same error. : I have all of the required jar's (AFAIK) in my application's lib folder. : : Any ideas? My best guess is that you are loading different versions of differnet jars -- so that you have MailEntityProcessor from one version of solr and EntityProcessor from a differnet version of solr, so it can't use it. double check all of the log messages from when you start up Solr, messages from the method "replaceClassLoader" will list every plugin jar it is loading: make sure all the jars listed are the ones you expect. if that all checks out, then load the url http://localhost:8983/solr/admin/system and ensure that no solr related jars are listed in the classpath or bootclasspath. -Hoss Re: Bizarre TFV output 2010-06-18 Thread Chris Hostetter : It shows some unwanted and possibly erroneous terms. they may be unwanted, but if it's returning them then they are in your index ... you know the docId and field in question (it's in your URL) so you can look at the source text, paste it into anslysis.jsp and see exactly why those terms are being indexed based on your fieldtype -- then change either the source data or the fieldtype analyser as needed. : And some improper stemming (e.g.requir? require.) depending on the stemmer you are using, "requir" may be a totally legitimate root (the programatic stemmers like Porter and Snoball make no claim that the terms they produce will be real words, just that words with a common root will *probably* transform into the same Term) -Hoss Re: Bizarre TFV output 2010-06-18 Thread Darren Govoni Thanks for the explanation Chris. I'll try it but the term " " strikes me as not very legitimate and the source text is just space bounded words so even if its doing what it is supposed to, I'm not sure this term is helpful in the index. I'm kinda new to TFV's though, so much to learn. On Fri, 2010-06-18 at 12:43 -0700, Chris Hostetter wrote: > : It shows some unwanted and possibly erroneous terms. > > they may be unwanted, but if it's returning them then they are in your > index ... you know the docId and field in question (it's in your URL) so > you can look at the source text, paste it into anslysis.jsp and see > exactly why those terms are being indexed based on your fieldtype -- then > change either the source data or the fieldtype analyser as needed. > > : And some improper stemming (e.g.requir? require.) > > depending on the stemmer you are using, "requir" may be a totally > legitimate root (the programatic stemmers like Porter and Snoball make no > claim that the terms they produce will be real words, just that words with > a common root will *probably* transform into the same Term) > > > > > -Hoss > Re: How to open/update/delete remote index ? 2010-06-18 Thread Peter Karich I tried luke via ssh -X ... with success ;-) > Hi, > > I don't think there is a GUI for this, other than the Web browser. > > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > - Original Message > >> From: abhay kumar >> To: solr-user@lucene.apache.org >> Sent: Fri, June 18, 2010 9:09:43 AM >> Subject: How to open/update/delete remote index ? >> >> Hi, >> > I am working with solr in production which is configured on remote > >> server . >> > I need to delete some documents from solr index. > > I know > >> this can be done by curl by calling solr "update" request handler. >> > But i'm > >> looking for GUI tool. >> > I tried luke but luke doesn't open remote > >> index. >> > Do we have any tool which can open/delete/update remote index > >> ? >> > A quick reply will be appreciated. > > Regards, > Abhay > > -- http://karussell.wordpress.com/ Re: Bizarre TFV output 2010-06-18 Thread Chris Hostetter : Thanks for the explanation Chris. I'll try it but the term : " " : : strikes me as not very legitimate and the source text is just space : bounded words so even if its doing what it is supposed to, I'm not sure : this term is helpful in the index. i didnt' say it was helpful -- i just said there's no indication of a bug in TFVC. it may be a bug in your source data, or a bad decision in your field type, or a bug in the indexing code ... it's not neccessarily "right" but nothing you've posted gives any indication of a bug in solr. show us your fieldtype and your source data and we might be able to offer more help, but as is all you've shown us is that you have a really long term in your index. -Hoss Re: solr indexing takes a long time and is not reponsive to abort command 2010-06-18 Thread Peter Karich Did you kill the process or does a reload help afterwards? Did you look into the logs? Are there errors saying sth. of a write-lock? Peter. > Hi, > > I have multi-core solr setup. All cores finished indexing in reasonable time > but one. I look at the dataimport info for the one that's hanging. The > process is still in busy state but no requests made or rows fetched. The > database side just showed the process is waiting for future command and is > doing nothing. The attempt to abort the process doesn't really work. Does > anyone know what's happening here? Thanks! > > Wen Re: Bizarre TFV output 2010-06-18 Thread Darren Govoni Well stated. You are correct. Here is the field It uses the text field type as its defined in Solr schema. I didn't change it. The input text is a 6 page UTF-8 text document, the relevant line the term seems to be related to. Just a sentence with no specific boundaries. "...perform more queries and read more results. Even though this example is simple, consider cases where there are intersections between thousands ..." Maybe I need to indicate tokenized? Darren On Fri, 2010-06-18 at 12:52 -0700, Chris Hostetter wrote: > : Thanks for the explanation Chris. I'll try it but the term > : " : > name="queriesandreadmoreresultseventhoughthisexampleissimpleconsidercaseswheretherear"> > " > : > : strikes me as not very legitimate and the source text is just space > : bounded words so even if its doing what it is supposed to, I'm not sure > : this term is helpful in the index. > > i didnt' say it was helpful -- i just said there's no indication of a bug > in TFVC. it may be a bug in your source data, or a bad decision in your > field type, or a bug in the indexing code ... it's not neccessarily > "right" but nothing you've posted gives any indication of a bug in solr. > > show us your fieldtype and your source data and we might be able to offer > more help, but as is all you've shown us is that you have a really long > term in your index. > > > > -Hoss > solr indexing takes a long time and is not reponsive to abort command 2010-06-18 Thread Ya-Wen Hsu I don’t see my last email showed in the mailing list so I’m sending again. Below is the original email. Hi, I have multi-core solr setup. All cores finished indexing in reasonable time but one. I look at the dataimport info for the one that’s hanging. The process is still in busy state but no requests made or rows fetched. The database side just showed the process is waiting for future command and is doing nothing. The attempt to abort the process doesn’t really work. Does anyone know what’s happening here? Thanks! Wen RE: solr indexing takes a long time and is not reponsive to abort command 2010-06-18 Thread Ya-Wen Hsu Sorry if you received duplicate email from me. I checked the log, there is no error in the log and no write-lock message in the log. Where else can I check for more information? Can I see if any query is running? I finally killed the process and run it again. This situation happened couple times in our production and qa environment. It usually works after we kill and restart the process. However, we would like to figure out what happen in the first place. Thanks! Wen -Original Message- From: Peter Karich [mailto:peat...@yahoo.de] Sent: Friday, June 18, 2010 1:04 PM To: solr-user@lucene.apache.org Subject: Re: solr indexing takes a long time and is not reponsive to abort command Did you kill the process or does a reload help afterwards? Did you look into the logs? Are there errors saying sth. of a write-lock? Peter. > Hi, > > I have multi-core solr setup. All cores finished indexing in reasonable time > but one. I look at the dataimport info for the one that's hanging. The > process is still in busy state but no requests made or rows fetched. The > database side just showed the process is waiting for future command and is > doing nothing. The attempt to abort the process doesn't really work. Does > anyone know what's happening here? Thanks! > > Wen Re: Autocompletion with Solritas 2010-06-18 Thread Erik Hatcher On Jun 18, 2010, at 2:56 PM, Ken Krugler wrote: 3. I tried "ant create-package" from trunk/solr, and got this error near the end: /Users/kenkrugler/svn/lucene/lucene-trunk/solr/common-build.xml: 252: /Users/kenkrugler/svn/lucene/lucene-trunk/solr/contrib/ velocity/src not found. I don't see contrib/velocity anywhere in the Lucene trunk tree. I've removed the velocity cruft that was left in the build for mavenization. But I personally can't get create-package to run successfully locally, it ends with this: create-package: [delete] Deleting: /Users/erikhatcher/dev/solucene/solr/dist/ apache-solr-4.0-dev.tgz [tar] Building tar: /Users/erikhatcher/dev/solucene/solr/dist/ apache-solr-4.0-dev.tgz BUILD FAILED /Users/erikhatcher/dev/solucene/solr/build.xml:715: Problem creating TAR: Input/output error Anyone else experience that? Or have it run successfully? Erik Re: solr indexing takes a long time and is not reponsive to abort command 2010-06-18 Thread Otis Gospodnetic DIH has a UI in Solr Admin that will show you the status of the indexing process. Not sure if you can see that in your Solr or not. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Ya-Wen Hsu > To: "solr-user@lucene.apache.org" > Sent: Fri, June 18, 2010 4:28:03 PM > Subject: RE: solr indexing takes a long time and is not reponsive to abort > command > > Sorry if you received duplicate email from me. I checked the log, there > is no error in the log and no write-lock message in the log. Where else can I > check for more information? Can I see if any query is running? I finally > killed the process and run it again. This situation happened couple times in > our > production and qa environment. It usually works after we kill and restart the > process. However, we would like to figure out what happen in the first place. > Thanks! Wen -Original Message- From: Peter Karich > [mailto: > href="mailto:peat...@yahoo.de";>peat...@yahoo.de] Sent: Friday, June 18, > 2010 1:04 PM To: > href="mailto:solr-user@lucene.apache.org";>solr-user@lucene.apache.org Subject: > Re: solr indexing takes a long time and is not reponsive to abort > command Did you kill the process or does a reload help afterwards? Did > you look into the logs? Are there errors saying sth. of a > write-lock? Peter. > Hi, > > I have multi-core solr > setup. All cores finished indexing in reasonable time but one. I look at the > dataimport info for the one that's hanging. The process is still in busy > state > but no requests made or rows fetched. The database side just showed the > process > is waiting for future command and is doing nothing. The attempt to abort the > process doesn't really work. Does anyone know what's happening here? > Thanks! > > Wen Re: Data Import Handler Rich Format Documents 2010-06-18 Thread Alexey Serba I think you can use existing ExtractingRequestHandler to do the job, i.e. add child entity to your DIH metadata http://localhost:8983/solr/update/extract?extractOnly=true&wt=xml&indent=on&stream.url=${metadata.url}"; dataSource="solr"> That's not working example, just basic idea, you still need to uri_escape ${metadata.url} reference probably using some transformer (regexp, javascript?) and extract file content from ERH xml response using xpath and probably do some html stripping. HTH, Alex On Fri, Jun 18, 2010 at 4:51 PM, Tod wrote: > I have a database containing Metadata from a content management system. > Part of that data includes a URL pointing to the actual published document > which can be an HTML file or a PDF, MS Word/Excel/Powerpoint, etc. > > I'm already indexing the Metadata and that provides a lot of value. The > customer however would like that the content pointed to by the URL also be > indexed for more discrete searching. > > This article at Lucid: > > http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Searching-rich-format-documents-stored-DBMS > > describes the process of coding a custom transformer. A separate article > I've read implies Nutch could be used to provide this functionality too. > > What would be the best and most efficient way to accomplish what I'm trying > to do? I have a feeling the Lucid article might be dated and there might > ways to do this now without any coding and maybe without even needing to use > Nutch. I'm using the current release version of Solr. > > Thanks in advance. > > > - Tod > Re: Data Import Handler Rich Format Documents 2010-06-18 Thread Chris Hostetter : I think you can use existing ExtractingRequestHandler to do the job, : i.e. add child entity to your DIH metadata why would you do this instead of using the TikaEntityProcessor as i already suggested in my earlier mail? -Hoss Re: MappingCharFilterFactory equivalent for use after tokenizer? 2010-06-18 Thread Lance Norskog Indeed. Also, it should be possible to output multiple synonyms based on the mapping: word_with_umlaut should be become word_with_u and word_with_ue as synonyms. (Ok, maybe this example is wrong, but it illustrates the idea.) On Fri, Jun 18, 2010 at 12:17 PM, Jan Høydahl / Cominvent wrote: > It would be nice to have, because sometimes you want to normalize accents and > other characters but want to wait until other filters have run. Especially if > those filters are dictionary based and therefore need the original word form. > > Do you have a clue of how different a CharFilter is from a normal token > Filter - perhaps it is a quick port? > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > Training in Europe - www.solrtraining.com > > On 18. juni 2010, at 18.38, Ahmet Arslan wrote: > >>> Is there a token filter which do the same job as >>> MappingCharFilterFactory but after tokenizer, reading the >>> same config file? >> >> No, closest thing can be PatternReplaceFilterFactory. >> >> http://lucene.apache.org/solr/api/org/apache/solr/analysis/PatternReplaceFilterFactory.html >> >> >> > > -- Lance Norskog goks...@gmail.com Re: Autocompletion with Solritas 2010-06-18 Thread Ken Krugler Hi Erik, On Jun 17, 2010, at 8:34pm, Erik Hatcher wrote: Your wish is my command. Check out trunk, fire up Solr (ant run- example), index example data, hit http://localhost:8983/solr/browse - type in search box. Just used jQuery's autocomplete plugin and the terms component for now, on the name field. Quite simple to plug in, actually. Check the commit diff. The main magic is doing this: Stupidly, though, jQuery's autocomplete seems to be hardcoded to send a q parameter, but I coded it to also send the same value as terms.prefix - but this could be an issue if hitting a different request handler where q is used for the actual query for filtering terms on. Let's say, just for grins, that a different field (besides "name") is being used for autocompletion. What would be all the places I'd need to hit to change the field, besides the terms.fl value in layout.vm? For example, what about browse.vm: $("input[type=text]").autoSuggest("/solr/suggest", {selectedItemProp: "name", searchObjProps: "name"}}); I'm asking because I'm trying to use this latest support with an index that uses "product_name" for the auto-complete field, and I'm not getting any auto-completes happening. I see from the Solr logs that requests being made to /solr/terms during auto-complete that look like: INFO: [] webapp=/solr path=/terms params = {limit = 10 ×tamp = 1276903135595 &terms .fl = product_name &q =rug&wt=velocity&terms.sort=count&v.template=suggest&terms.prefix=rug} status=0 QTime=0 Which I'd expect to work, but don't seem to be generating any results. What's odd is that if I try curling the same thing: curl -v "http://localhost:8983/solr/terms?limit=10×tamp=1276903135595&terms.fl=product_name&q=rug&wt=velocity&terms.sort=count&v.template=suggest&terms.prefix=rug " I get an empty HTML response: < Content-Type: text/html; charset=utf-8 < Content-Length: 0 < Server: Jetty(6.1.22) If I just use what I'd consider to be the minimum set of parameters: curl -v "http://localhost:8983/solr/terms?limit=10&terms.fl=product_name&q=rug&terms.sort=count&terms.prefix=rug " Then I get the expected XML response: < Content-Type: text/xml; charset=utf-8 < Content-Length: 225 < Server: Jetty(6.1.22) < 0name="QTime">0name="product_name">7 Any ideas what I'm doing wrong? Thanks, -- Ken On Jun 17, 2010, at 8:03 PM, Ken Krugler wrote: I don't believe Solritas supports autocompletion out of the box. So I'm wondering if anybody has experience using the LucidWorks distro & Solritas, plus the AJAX Solr auto-complete widget. I realize that AJAX Solr's autocomplete support is mostly just leveraging the jQuery Autocomplete plugin, and hooking it up to Solr facets, but I was curious if there were any tricks or traps in getting it all to work. Thanks, -- Ken Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g Re: MappingCharFilterFactory equivalent for use after tokenizer? 2010-06-18 Thread Robert Muir On Fri, Jun 18, 2010 at 7:11 PM, Lance Norskog wrote: > Indeed. Also, it should be possible to output multiple synonyms based > on the mapping: word_with_umlaut should be become word_with_u and > word_with_ue as synonyms. (Ok, maybe this example is wrong, but it > illustrates the idea.) > > I don't think we should do this. how many tokens would make? (such malformed input exists in the wild, e.g. someone spills beer on their keyboard and they key gets sticky) -- Robert Muir rcm...@gmail.com Re: federated / meta search 2010-06-18 Thread Lance Norskog Yes, you can do this. You need to have a common system for creating unique ids for the documents. Also, there's an odd problem around relevance. Relevance scoring is based on all of the terms in a field in the whole index, and there is a "statistical fingerprint" of this for an index. With two indexes from two sources, the terms in the documents will not have the same "fingerprint". Relevance scores from one shard will not match the meaning of a document's score in the other shard. There is a project to make this work in Solr, but it is not nearly finished. Lance Norskog On Fri, Jun 18, 2010 at 4:28 AM, Sascha Szott wrote: > Hi Joe & Markus, > > sounds good! Maybe I should better add a note on the Wiki page on federated > search [1]. > > Thanks, > Sascha > > [1] http://wiki.apache.org/solr/FederatedSearch > > Joe Calderon wrote: >> >> yes, you can use distributed search across shards with different >> schemas as long as the query only references overlapping fields, i >> usually test adding new fields or tokenizers on one shard and deploy >> only after i verified its working properly >> >> On Thu, Jun 17, 2010 at 1:10 PM, Markus Jelsma >> wrote: >>> >>> Hi, >>> >>> >>> >>> Check out Solr sharding [1] capabilities. I never tested it with >>> different schema's but if each node is queried with fields that it supports, >>> it should return useful results. >>> >>> >>> >>> [1]: http://wiki.apache.org/solr/DistributedSearch >>> >>> >>> >>> Cheers. >>> >>> -Original message- >>> From: Sascha Szott >>> Sent: Thu 17-06-2010 19:44 >>> To: solr-user@lucene.apache.org; >>> Subject: federated / meta search >>> >>> Hi folks, >>> >>> if I'm seeing it right Solr currently does not provide any support for >>> federated / meta searching. Therefore, I'd like to know if anyone has >>> already put efforts into this direction? Moreover, is federated / meta >>> search considered a scenario Solr should be able to deal with at all or >>> is it (far) beyond the scope of Solr? >>> >>> To be more precise, I'll give you a short explanation of my >>> requirements. Assume, there are a couple of Solr instances running at >>> different places. The documents stored within those instances are all >>> from the same domain (bibliographic records), but it can not be ensured >>> that the schema definitions conform to 100%. But lets say, there are at >>> least some index fields that are present in all instances (fields with >>> the same name and type definition). Now, I'd like to perform a search on >>> all instances at the same time (with the restriction that the query >>> contains only those fields that overlap among the different schemas) and >>> combine the results in a reasonable way by utilizing the score >>> information associated with each hit. Please note, that due to legal >>> issues it is not feasible to build a single index that integrates the >>> documents of all Solr instances under consideration. >>> >>> Thanks in advance, >>> Sascha >>> >>> > > -- Lance Norskog goks...@gmail.com Re: OOM on sorting on dynamic fields 2010-06-18 Thread Lance Norskog The Lucene implementation of sorting creates an array of four-byte ints for every document in the index, and another array of the unique values in the field. If the timestamps are 'date' or 'tdate' in the schema, they do not need the second array. You can also sort by a field's with a function query. This does not build the arrays, but might be a little slower. Yes, the sort arrays (and also facet values for a field) should be controlled by a fixed-size cache, but they are not. On Fri, Jun 18, 2010 at 7:52 AM, Matteo Fiandesio wrote: > Hello, > we are experiencing OOM exceptions in our single core solr instance > (on a (huge) amazon EC2 machine). > We investigated a lot in the mailing list and through jmap/jhat dump > analyzing and the problem resides in the lucene FieldCache that fills > the heap and blows up the server. > > Our index is quite small but we have a lot of sort queries on fields > that are dynamic,of type long representing timestamps and are not > present in all the documents. > Those queries apply sorting on 12-15 of those fields. > > We are using solr 1.4 in production and the dump shows a lot of > Integer/Character and Byte Array filled up with 0s. > With solr's trunk code things does not change. > > In the mailing list we saw a lot of messages related to this issues: > we tried truncating the dates to day precision,using missingSortLast = > true,changing the field type from slong to long,setting autowarming to > different values,disabling and enabling caches with different values > but we did not manage to solve the problem. > > We were thinking to implement an LRUFieldCache field type to manage > the FieldCache as an LRU and preventing but, before starting a new > development, we want to be sure that we are not doing anything wrong > in the solr configuration or in the index generation. > > Any help would be appreciated. > Regards, > Matteo > -- Lance Norskog goks...@gmail.com Re: customize the search algorithm of solr 2010-06-18 Thread Lance Norskog Solr uses Lucene's algorithms, so lucene-user is the right place for this topic. There is a project to add BM25 (or something like that) to Lucene as an alternate scorer. This may show you how to drop in your own scorer. On Fri, Jun 18, 2010 at 8:14 AM, sarfaraz masood wrote: > > Are there any means by which we can customize the search of solr, by plugins > etc ?? > > i have been working on a research based project to implement a new search > algorithm for search engines.I wanna know if i can make solr use this > algorithm to decide the resultant documents, and still allow me to use all > the rest of the features of solr. > > > -- Lance Norskog goks...@gmail.com Re: solr indexing takes a long time and is not reponsive to abort command 2010-06-18 Thread Lance Norskog Does this happen over and over? Does it happen every time? On Fri, Jun 18, 2010 at 1:19 PM, Ya-Wen Hsu wrote: > I don’t see my last email showed in the mailing list so I’m sending again. > Below is the original email. > > Hi, > > I have multi-core solr setup. All cores finished indexing in reasonable time > but one. I look at the dataimport info for the one that’s hanging. The > process is still in busy state but no requests made or rows fetched. The > database side just showed the process is waiting for future command and is > doing nothing. The attempt to abort the process doesn’t really work. Does > anyone know what’s happening here? Thanks! > > Wen > -- Lance Norskog goks...@gmail.com Re: SolrQuery and escaping special characters 2010-06-18 Thread Lance Norskog Thank you for the example, Ahmet! Paolo- what you did in choice 'b' does what you want - it escapes the colon in the URI. But Ahmet's example is a better way because it does not have the 'double-escaping' problem that us old Unix types are so familiar with. On 6/18/10, Chris Hostetter wrote: > > : I am not sure how SolrJ behaves regarding "escaping" special characters > : [1] in a query string. > > SolrJ encodes your strings for "transport" -- ie: it handles URL escaping > if it's sending hte query in a GET URL -- but id doesn't do "query parser > escaping" ... mainly because it has no way of knowing which query parser > you are using. > > > -Hoss > > -- Lance Norskog goks...@gmail.com Re: Autocompletion with Solritas 2010-06-18 Thread Erik Hatcher Have a look at suggest.vm - the "name" field is used in there too. Just those two places, layout.vm and suggest.vm. And I had already added a ## TODO in my local suggest.vm: ## TODO: make this more generic, maybe look at the request terms.fl? or just take the first terms field in the response? And also, ideally, there'd be a /suggest handler mapped with the field name specified there. I simply used what was already available to put suggest in there easily. Erik On Jun 18, 2010, at 7:54 PM, Ken Krugler wrote: Hi Erik, On Jun 17, 2010, at 8:34pm, Erik Hatcher wrote: Your wish is my command. Check out trunk, fire up Solr (ant run- example), index example data, hit http://localhost:8983/solr/browse - type in search box. Just used jQuery's autocomplete plugin and the terms component for now, on the name field. Quite simple to plug in, actually. Check the commit diff. The main magic is doing this: Stupidly, though, jQuery's autocomplete seems to be hardcoded to send a q parameter, but I coded it to also send the same value as terms.prefix - but this could be an issue if hitting a different request handler where q is used for the actual query for filtering terms on. Let's say, just for grins, that a different field (besides "name") is being used for autocompletion. What would be all the places I'd need to hit to change the field, besides the terms.fl value in layout.vm? For example, what about browse.vm: $("input[type=text]").autoSuggest("/solr/suggest", {selectedItemProp: "name", searchObjProps: "name"}}); I'm asking because I'm trying to use this latest support with an index that uses "product_name" for the auto-complete field, and I'm not getting any auto-completes happening. I see from the Solr logs that requests being made to /solr/terms during auto-complete that look like: INFO: [] webapp=/solr path=/terms params = {limit = 10 ×tamp = 1276903135595 &terms .fl = product_name &q = rug &wt=velocity&terms.sort=count&v.template=suggest&terms.prefix=rug} status=0 QTime=0 Which I'd expect to work, but don't seem to be generating any results. What's odd is that if I try curling the same thing: curl -v "http://localhost:8983/solr/terms?limit=10×tamp=1276903135595&terms.fl=product_name&q=rug&wt=velocity&terms.sort=count&v.template=suggest&terms.prefix=rug " I get an empty HTML response: < Content-Type: text/html; charset=utf-8 < Content-Length: 0 < Server: Jetty(6.1.22) If I just use what I'd consider to be the minimum set of parameters: curl -v "http://localhost:8983/solr/terms?limit=10&terms.fl=product_name&q=rug&terms.sort=count&terms.prefix=rug " Then I get the expected XML response: < Content-Type: text/xml; charset=utf-8 < Content-Length: 225 < Server: Jetty(6.1.22) < 0name="QTime">0name="product_name">7 Any ideas what I'm doing wrong? Thanks, -- Ken On Jun 17, 2010, at 8:03 PM, Ken Krugler wrote: I don't believe Solritas supports autocompletion out of the box. So I'm wondering if anybody has experience using the LucidWorks distro & Solritas, plus the AJAX Solr auto-complete widget. I realize that AJAX Solr's autocomplete support is mostly just leveraging the jQuery Autocomplete plugin, and hooking it up to Solr facets, but I was curious if there were any tricks or traps in getting it all to work. Thanks, -- Ken Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g Re: federated / meta search 2010-06-18 Thread Otis Gospodnetic Lance, which project in Solr are you referring to? Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Lance Norskog > To: solr-user@lucene.apache.org > Sent: Fri, June 18, 2010 8:16:46 PM > Subject: Re: federated / meta search > > Yes, you can do this. You need to have a common system for creating unique > ids for the documents. Also, there's an odd problem around relevance. > Relevance scoring is based on all of the terms in a field in the whole index, > and there is a "statistical fingerprint" of this for an index. With two > indexes from two sources, the terms in the documents will not have the > same "fingerprint". Relevance scores from one shard will not match > the meaning of a document's score in the other shard. There is a > project to make this work in Solr, but it is not nearly finished. Lance > Norskog On Fri, Jun 18, 2010 at 4:28 AM, Sascha Szott < > ymailto="mailto:sz...@zib.de"; href="mailto:sz...@zib.de";>sz...@zib.de> > wrote: > Hi Joe & Markus, > > sounds good! Maybe I should > better add a note on the Wiki page on federated > search > [1]. > > Thanks, > Sascha > > [1] > href="http://wiki.apache.org/solr/FederatedSearch"; target=_blank > >http://wiki.apache.org/solr/FederatedSearch > > Joe Calderon > wrote: >> >> yes, you can use distributed search across shards > with different >> schemas as long as the query only references > overlapping fields, i >> usually test adding new fields or tokenizers > on one shard and deploy >> only after i verified its working > properly >> >> On Thu, Jun 17, 2010 at 1:10 PM, Markus > Jelsma< > href="mailto:markus.jel...@buyways.nl";>markus.jel...@buyways.nl> >> > wrote: >>> >>> > Hi, >>> >>> >>> >>> Check out > Solr sharding [1] capabilities. I never tested it with >>> different > schema's but if each node is queried with fields that it > supports, >>> it should return useful > results. >>> >>> >>> >>> [1]: > href="http://wiki.apache.org/solr/DistributedSearch"; target=_blank > >http://wiki.apache.org/solr/DistributedSearch >>> >>> >>> >>> > Cheers. >>> >>> -Original > message- >>> From: Sascha Szott< > ymailto="mailto:sz...@zib.de"; > href="mailto:sz...@zib.de";>sz...@zib.de> >>> Sent: Thu > 17-06-2010 19:44 >>> To: > ymailto="mailto:solr-user@lucene.apache.org"; > href="mailto:solr-user@lucene.apache.org";>solr-user@lucene.apache.org; >>> > Subject: federated / meta search >>> >>> Hi > folks, >>> >>> if I'm seeing it right Solr currently > does not provide any support for >>> federated / meta searching. > Therefore, I'd like to know if anyone has >>> already put efforts > into this direction? Moreover, is federated / meta >>> search > considered a scenario Solr should be able to deal with at all or >>> > is it (far) beyond the scope of Solr? >>> >>> To be more > precise, I'll give you a short explanation of my >>> requirements. > Assume, there are a couple of Solr instances running at >>> > different places. The documents stored within those instances are > all >>> from the same domain (bibliographic records), but it can not > be ensured >>> that the schema definitions conform to 100%. But lets > say, there are at >>> least some index fields that are present in > all instances (fields with >>> the same name and type definition). > Now, I'd like to perform a search on >>> all instances at the same > time (with the restriction that the query >>> contains only those > fields that overlap among the different schemas) and >>> combine the > results in a reasonable way by utilizing the score >>> information > associated with each hit. Please note, that due to legal >>> issues > it is not feasible to build a single index that integrates the >>> > documents of all Solr instances under > consideration. >>> >>> Thanks in > advance, >>> > Sascha >>> >>> > > -- > Lance Norskog > href="mailto:goks...@gmail.com";>goks...@gmail.com Re: Can query boosting be used with a custom request handlers? 2010-06-18 Thread John Wang Hi Chris: Can you please elaborate on how to use the QParser framework? Thanks! -John On Fri, Jun 11, 2010 at 10:56 AM, Chris Hostetter wrote: > > : So it's possible to use both dismax and custom request handler in the > same query? > > it *really* depends on the request handler ... if it uses the QParser > framework for query parsing, then yes it should work fine -- but the > request handler has to be written to work that way. > > > -Hoss > > Re: How to open/update/delete remote index ? 2010-06-18 Thread abhay kumar HI, @Erik I am looking for a LUKE like command line GUI tool without browser which can open/delete remote index. Regards, Abhay On Sat, Jun 19, 2010 at 1:20 AM, Peter Karich wrote: > I tried luke via > > ssh -X ... > > with success ;-) > > > Hi, > > > > I don't think there is a GUI for this, other than the Web browser. > > > > > > Otis > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > > Lucene ecosystem search :: http://search-lucene.com/ > > > > > > > > - Original Message > > > >> From: abhay kumar > >> To: solr-user@lucene.apache.org > >> Sent: Fri, June 18, 2010 9:09:43 AM > >> Subject: How to open/update/delete remote index ? > >> > >> Hi, > >> > > I am working with solr in production which is configured on remote > > > >> server . > >> > > I need to delete some documents from solr index. > > > > I know > > > >> this can be done by curl by calling solr "update" request handler. > >> > > But i'm > > > >> looking for GUI tool. > >> > > I tried luke but luke doesn't open remote > > > >> index. > >> > > Do we have any tool which can open/delete/update remote index > > > >> ? > >> > > A quick reply will be appreciated. > > > > Regards, > > Abhay > > > > > > > -- > http://karussell.wordpress.com/ > >

80 matches

Mail list logo