How to achieve combination of features grouping, scoring...
Hi, I spent some time on solr in order to figure out what it can do. I sill have some problems finding the right way to do my search. I have a bunch of heterogenous objects that I want to search. All of these objects belong to an owner. When a search is issued I like not only to find the individual objects but the grouped by their owner. For grouping I didn't find much valuable other than to do this with a response writer. I tried collapsing but this is not what I mean. And facets are still something different. The only thing is the XSLTResponseWriter that does grouping of stuff afterwards. What is the best way to achieve this: - how to group stuff when there are many results to take into account - how to score based on grouped objects. To group with the response writer is not hard. But if I want to do pagination I like to have the top scored group at the top of the results. Is there a way to do so? - I like to only show the fields that match a query. As someone hinted here on the ML doing this with highlighting is the only way I found. But then I don't understand that I can provide a field list (hl.fl) but this does not take a * for every field like some of the other parameters do. Thanks in advance, Norbert
Issue in Facet on date field
Hi, I have to create two facets on a date field: 1) First Facet will have results between two date range , i.e. [NOW TO NOW+45DAYS] 2) Second Facet will have results between two date range , i.e. [NOW-45DAYS TO NOW] I want both results in a single query. The query i am using is mentioned below : &facet=true&facet.date=productPublicationDate_product_dt&f.productPublicationDate_product_dt.facet.date.start=NOW&f.productPublicationDate_product_dt.facet.date.end=NOW+45DAYS&f.productPublicationDate_product_dt.facet.date.gap=%2B45DAYS&facet.date=productPublicationDate_product_dt&f.productPublicationDate_product_dt.facet.date.start=NOW-45DAYS&f.productPublicationDate_product_dt.facet.date.end=NOW&f.productPublicationDate_product_dt.facet.date.gap=%2B45DAYS ISSUE: I am getting same response in two nodes, one query is overriding the response of second facet: - - - 0 +45DAYS 2009-02-27T08:37:26.662Z - 0 +45DAYS 2009-02-27T08:37:26.662Z Please suggest the way by which i can differentiate these two facet.field in the query ? -- View this message in context: http://www.nabble.com/Issue-in-Facet-on-date-field-tp21431422p21431422.html Sent from the Solr - User mailing list archive at Nabble.com.
non fix highlight snippet size
Hey there, I need a rule in my highlights that sets for example, the snippet size to 400 in case there's just one snippet, 225 in case two snippeds are found and 125 in case 3 or more snippets are found. Is there any way to do that via solrconfig.xml (for what I have seen don't think so...) or should I code a plugin? In the second case do a I need an extended class of GapFragmenter or what I should hack is in another pice of the source? Thanks in advanced -- View this message in context: http://www.nabble.com/non-fix-highlight-snippet-size-tp21431456p21431456.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Issue in Facet on date field
Hey, That's because Solr just looks for one start,end and gap params in solrconfig.xml. It just allows you to do datefaceting for differents fields but just in one range period. I was in the same situation as you are, what I did was modify the function getFacetDateCounts() from Simplefacets.class to make it get as match params (stard/end/gap) as I want. Once it's done I do date faceting in all time periods. Result would look like: 2238 +3MONTH 2009-01-13T00:00:00Z 3822 +6MONTH 2009-01-13T00:00:00Z 3864 +1YEAR 2009-01-13T00:00:00Z Doing facets for the last year, 6 month and 3 month. I don't think there's a way to do that without modifiying the source (if you find it let me know :D) prerna07 wrote: > > Hi, > > I have to create two facets on a date field: > 1) First Facet will have results between two date range , i.e. [NOW TO > NOW+45DAYS] > 2) Second Facet will have results between two date range , i.e. > [NOW-45DAYS TO NOW] > > I want both results in a single query. The query i am using is mentioned > below : > > &facet=true&facet.date=productPublicationDate_product_dt&f.productPublicationDate_product_dt.facet.date.start=NOW&f.productPublicationDate_product_dt.facet.date.end=NOW+45DAYS&f.productPublicationDate_product_dt.facet.date.gap=%2B45DAYS&facet.date=productPublicationDate_product_dt&f.productPublicationDate_product_dt.facet.date.start=NOW-45DAYS&f.productPublicationDate_product_dt.facet.date.end=NOW&f.productPublicationDate_product_dt.facet.date.gap=%2B45DAYS > > ISSUE: > I am getting same response in two nodes, one query is overriding the > response of second facet: > > - > > > - > - > 0 > +45DAYS > 2009-02-27T08:37:26.662Z > > - > 0 > +45DAYS > 2009-02-27T08:37:26.662Z > > > > > Please suggest the way by which i can differentiate these two facet.field > in the query ? > -- View this message in context: http://www.nabble.com/Issue-in-Facet-on-date-field-tp21431422p21431727.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Issue in Facet on date field
There can be two other options: 1) To make 2 solr queries to get two facets 2) Use copy field of schema.xml. Thanks, Prerna Marc Sturlese wrote: > > Hey, > That's because Solr just looks for one start,end and gap params in > solrconfig.xml. It just allows you to do datefaceting for differents > fields but just in one range period. > I was in the same situation as you are, what I did was modify the function > getFacetDateCounts() from Simplefacets.class to make it get as match > params (stard/end/gap) as I want. Once it's done I do date faceting in all > time periods. > Result would look like: > > > > 2238 > +3MONTH > 2009-01-13T00:00:00Z > 3822 > +6MONTH > 2009-01-13T00:00:00Z > 3864 > +1YEAR > 2009-01-13T00:00:00Z > > > > Doing facets for the last year, 6 month and 3 month. > I don't think there's a way to do that without modifiying the source (if > you find it let me know :D) > > > > prerna07 wrote: >> >> Hi, >> >> I have to create two facets on a date field: >> 1) First Facet will have results between two date range , i.e. [NOW TO >> NOW+45DAYS] >> 2) Second Facet will have results between two date range , i.e. >> [NOW-45DAYS TO NOW] >> >> I want both results in a single query. The query i am using is mentioned >> below : >> >> &facet=true&facet.date=productPublicationDate_product_dt&f.productPublicationDate_product_dt.facet.date.start=NOW&f.productPublicationDate_product_dt.facet.date.end=NOW+45DAYS&f.productPublicationDate_product_dt.facet.date.gap=%2B45DAYS&facet.date=productPublicationDate_product_dt&f.productPublicationDate_product_dt.facet.date.start=NOW-45DAYS&f.productPublicationDate_product_dt.facet.date.end=NOW&f.productPublicationDate_product_dt.facet.date.gap=%2B45DAYS >> >> ISSUE: >> I am getting same response in two nodes, one query is overriding the >> response of second facet: >> >> - >> >> >> - >> - >> 0 >> +45DAYS >> 2009-02-27T08:37:26.662Z >> >> - >> 0 >> +45DAYS >> 2009-02-27T08:37:26.662Z >> >> >> >> >> Please suggest the way by which i can differentiate these two facet.field >> in the query ? >> > > -- View this message in context: http://www.nabble.com/Issue-in-Facet-on-date-field-tp21431422p21431934.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Getting only fields that match
On Sun, 2009-01-11 at 17:07 +0530, Shalin Shekhar Mangar wrote: > On Sun, Jan 11, 2009 at 4:02 PM, Norbert Hartl wrote: > > > > > I like the search result to include only the fields > > that matched the search. Is this possible? I only > > saw the field spec where you can have a certain set > > of fields or all. > > > Are you looking for highlighting (snippets)? > > http://wiki.apache.org/solr/HighlightingParameters > > A Field can be indexed (searchable) or stored (retrievable) or both. When > you make a query to Solr, you yourself specify which fields it needs to > search on. If they are stored, you can ask to retrieve those fields only. > Not sure if that answers your question. > Having another look on your proposal I can see you might be right :) Seems to me to be most doable approach by now, too. thanks, Norbert
solrj delete by Id problem
I have a problem with solrj delete By Id . If I search a keyword and it has more than 1 result no (for example 7) then I delete on of the resulted doc with solrj (server.deleteById ) , I search this keyword again the result no is zero . and it 's not correct because it should be 6 . It should shows the other 6 docs. I should mention that when I restart the server the result will be correct and it shows the correct result no (I mean 6 docs). besides, It has the problem with the keywords that we have searched before deleting the docs and it has not problem with new key words. -- View this message in context: http://www.nabble.com/solrj-delete-by-Id-problem-tp21433056p21433056.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Query regarding Spelling Suggestions
Hi Grant, My spellcheck is now working fine with the following configuration: word solr.IndexBasedSpellChecker word UTF-8 d:\solr-tomcat\solr\data\syn_index ./spellcheckerFile1 Earlier I configured the lucene-index (dictionary) "syn_index" to the spellcheckIndexDir as interpreted from the wiki page. Then I was looking into the file IndexBasedSpellChecker.java and found the usage of "sourceLocation". When I configured my lucene-index (dictionary) "syn_index" as "sourceLocation" the IndexBasedSpellChecker worked. I have following question / observation : (just to ensure that my configurations are correct) The lucene-index (dictionary) "syn_index" is already an index so do we have to specify the spellcheckIndexDir again? (If I do not give the spellcheckIndexDir I do not get any suggestions.) When I give the build command the spellcheckIndexDir gets populated reading the "syn_index". Can we avoid this duplication? If the "sourceLocation" is mandatory when using a third party index for spelling suggestions, may I update the Solr WIKI to include this important information. Thanks & Best Regards, ~Mukta -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Monday, January 12, 2009 10:15 PM To: solr-user@lucene.apache.org Subject: Re: Query regarding Spelling Suggestions Solr 1.3 doesn't use Log4J, it uses Java Utility Logging (JUL). I believe the info level in the logs is sufficient. Let's start by posting what you have? Also, are you able to get the sample spellchecking to work? On Jan 12, 2009, at 2:16 AM, Deshpande, Mukta wrote: > Hi, > > Could you please send me the needful entries in log4j.properties to > enable logging, explicitly for SpellCheckComponent. > > My current log4j.properties looks like: > > log4j.rootLogger=INFO,console > log4j.appender.console=org.apache.log4j.ConsoleAppender > log4j.appender.console.target=System.err > log4j.appender.console.layout=org.apache.log4j.PatternLayout > log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd > HH:mm:ss} %p > %c{2}: %m%n > log4j.logger.org.apache.solr=DEBUG > > With these settings I can only see the INFO level logs. > > I tried to change the log level for SpellCheckComponent to "FINE" > using > the admin logging page http://localhost:8080/solr/admin/logging but > did not see any difference in logging. > > Thanks, > ~Mukta > > -Original Message- > From: Grant Ingersoll [mailto:gsing...@apache.org] > Sent: Monday, January 12, 2009 3:22 AM > To: solr-user@lucene.apache.org > Subject: Re: Query regarding Spelling Suggestions > > Can you send the full log? > > On Jan 11, 2009, at 1:51 PM, Deshpande, Mukta wrote: > >> I am using the example schema that comes with the Solr installation >> downloaded from http://www.mirrorgeek.com/apache.org/lucene/solr/. >> I have added the "word" field with "textSpell" fieldtype in the >> schema.xml file, as specified in the below mail. >> >> My spelling index exist under /data/ If I open my index in >> Luke I can see the entries against "word" >> field. >> >> Thanks, >> ~Mukta >> >> >> >> >> From: Grant Ingersoll [mailto:gsing...@apache.org] >> Sent: Fri 1/9/2009 8:29 AM >> To: solr-user@lucene.apache.org >> Subject: Re: Query regarding Spelling Suggestions >> >> >> >> Can you put the full log (as short as possibly demonstrates the >> problem) somewhere where I can take a look? Likewise, can you share >> your schema? >> >> Also, does the spelling index exist under /data/index? If >> you open it w/ Luke, does it have entries? >> >> Thanks, >> Grant >> >> On Jan 8, 2009, at 11:30 PM, Deshpande, Mukta wrote: >> >>> >>> Yes. I send the build command as: >>> http://localhost:8080/solr/select/? >>> q=documnet&spellcheck=true&spellch >>> eck >>> .build >>> =true&spellcheck.count=2&spellcheck.q=parfect&spellcheck.dictionar >>> y=dict >>> >>> The Tomcat log shows: >>> Jan 9, 2009 9:55:19 AM org.apache.solr.core.SolrCore execute >>> INFO: [] webapp=/solr path=/select/ >>> params >>> ={spellcheck=true&q=documnet&spellcheck.q=parfect&spellcheck.dicti >>> onary=dict&spellcheck.count=2&spellcheck.build=true} hits=0 status=0 >>> QTime=141 >>> >>> Even after sending the build command I do not get any suggestions. >>> Can you please check. >>> >>> Thanks, >>> ~Mukta >>> >>> -Original Message- >>> From: Grant Ingersoll [mailto:gsing...@apache.org] >>> Sent: Thursday, January 08, 2009 7:42 PM >>> To: solr-user@lucene.apache.org >>> Subject: Re: Query regarding Spelling Suggestions >>> >>> Did you send in the build command? See >>> http://wiki.apache.org/solr/SpellCheckComponent >>> >>> On Jan 8, 2009, at 5:14 AM, Deshpande, Mukta wrote: >>> Hi, I am using Wordnet dictionary for spelling suggestions. The dictionary is converted to Solr index with only one field "word" and stored in location /data/syn_index, using syns2Index.java program
Re: solrj delete by Id problem
Did you call commit after the delete? On Tue, Jan 13, 2009 at 4:12 PM, Parisa wrote: > > I have a problem with solrj delete By Id . If I search a keyword and it has > more than 1 result no (for example 7) then I delete on of the resulted doc > with solrj (server.deleteById ) , I search this keyword again the result > no is zero . and it 's not correct because it should be 6 . It should shows > the other 6 docs. > > > I should mention that when I restart the server the result will be correct > and it shows the correct result no (I mean 6 docs). > > besides, It has the problem with the keywords that we have searched before > deleting the docs and it has not problem with new key words. > > -- > View this message in context: > http://www.nabble.com/solrj-delete-by-Id-problem-tp21433056p21433056.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Regards, Shalin Shekhar Mangar.
Re: Query regarding Spelling Suggestions
On Tue, Jan 13, 2009 at 5:16 PM, Deshpande, Mukta wrote: > I have following question / observation : (just to ensure that my > configurations are correct) > > The lucene-index (dictionary) "syn_index" is already an index so do we > have to specify the spellcheckIndexDir again? >(If I do not give the spellcheckIndexDir I do not get any > suggestions.) The "syn_index" here is your Lucene index from which you want to use as the source for words. Spell checker processes each token to create n-grams which are then stored into a lucene index at the "spellCheckIndexDir" or in memory. This is why you need to specify both sourceLocation and spellcheckIndexDir. If you do not give spellCheckIndexDir, spell checker will create a Lucene index in-memory, so it should still work. Are you sure you gave a build command before issuing the query? > When I give the build command the spellcheckIndexDir gets populated > reading the "syn_index". Can we avoid this duplication? Spell checker needs a Lucene index to work. It creates a new one and adds tokens after some processing to this index. There is no way to avoid creation of another index at present. However, it should be possible to modify it to store it's fields inside an existing Lucene index (maybe even Solr's own index). Contributions are always welcome :) > If the "sourceLocation" is mandatory when using a third party index for > spelling suggestions, may I update the Solr WIKI to include this > important information. Sure, please go ahead. Thanks! -- Regards, Shalin Shekhar Mangar.
Re: Custom Transformer to handle Timestamp
On Tue, Jan 13, 2009 at 12:53 AM, con wrote: > > Hi all > > I am using solr to index data from my database. > In my database there is a timestamp field of which data will be in the form > of, > 15-09-08 06:28:38.44200 AM. The column is of type TIMESTAMP in the > oracle db. > So in the schema.xml i have mentioned as: > > > While indexing data in the debug mode i get this timestamp value as > >oracle.sql.TIMESTAMP:oracle.sql.timest...@f536e8 > > > And when i do a searching this value is not displaying while all other > fields indexed along with it are getting displayed. Hmm, interesting. It seems oracle.sql.TIMESTAMP does not inherit from java.sql.Timestamp or java.util.Date. This is why DataImportHandler/Solr cannot make sense out of it and the string representation is being stored in the index. However it has a toJdbc() method which will return a Jdbc compatible object. http://download-uk.oracle.com/otn_hosted_doc/jdeveloper/904preview/jdbc-javadoc/oracle/sql/TIMESTAMP.html#toJdbc() > 1) So do i need to write a custom transformer to add these values to the > index. Yes, it seems like that is the only way. > 2)And if yes I am confused how it is? Is there a sample code somewhere? Yes, see an example here -- http://wiki.apache.org/solr/DIHCustomTransformer > > I have tried the sample TrimTransformer and it is working. But can i > convert > this string to a valid date format.(I am not a java expert..:-( )? I would start by trying something like this: oracle.jdbc.TIMESTAMP timestamp = (oracle.jdbc.TIMESTAMP) row.get("your_timestamp_field_name"); row.put("your_timestamp_field_name", timestamp.toJdbc()); > > -- > View this message in context: > http://www.nabble.com/Custom-Transformer-to-handle-Timestamp-tp21421742p21421742.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Regards, Shalin Shekhar Mangar.
Re: solrj delete by Id problem
Shalin Shekhar Mangar wrote: > > Did you call commit after the delete? > > Ofcourse I call commit and I test both commit(false,false) and > commit(true,true) in both cases the result is the same. > > On Tue, Jan 13, 2009 at 4:12 PM, Parisa wrote: > >> >> I have a problem with solrj delete By Id . If I search a keyword and it >> has >> more than 1 result no (for example 7) then I delete on of the resulted >> doc >> with solrj (server.deleteById ) , I search this keyword again the >> result >> no is zero . and it 's not correct because it should be 6 . It should >> shows >> the other 6 docs. >> >> >> I should mention that when I restart the server the result will be >> correct >> and it shows the correct result no (I mean 6 docs). >> >> besides, It has the problem with the keywords that we have searched >> before >> deleting the docs and it has not problem with new key words. >> >> -- >> View this message in context: >> http://www.nabble.com/solrj-delete-by-Id-problem-tp21433056p21433056.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > -- > Regards, > Shalin Shekhar Mangar. > > -- View this message in context: http://www.nabble.com/solrj-delete-by-Id-problem-tp21433056p21435839.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DataImportHandler: UTF-8 and Mysql
On Mon, Jan 12, 2009 at 3:48 PM, gwk wrote: > 1. Posting UTF-8 data through the example post-script works and I get > the proper results back when I query using the admin page. > However, data imported through the DataImportHandler from a MySQL > database (the database contains correct data, it's a copy of a > production db and selecting through the client gives the correct > characters) I get "ó" instead of "ó". I've tried several > combinations of arguments to my datasource url > (useUnicode=true&characterEncoding=UTF-8) but it does not seem to > help. How do I get this to work correctly? DataImportHandler does not change any encoding. It receives a Java string object from the driver and adds it to Solr. So I'm guessing the problem is in the database or in the driver. Did you create the tables with UTF-8 encoding? Try looking in the MySql driver configuration parameters to force UTF-8. Sorry, I can't be of much help here. > 2. On the wikipage for DataImportHandler, the deletedPkQuery has no > real description, am I correct in assuming it should contain a > query which returns the ids of items which should be removed from > the index? Yes you are right. It should return the primary keys of the rows to be deleted. > > 3. Another question concerning the DataImportHandler wikipage, I'm > not sure about the exact way the field-tag works. From the first > data-config.xml example for the full-import I can infer that the > "column"-attribute represents the column from the sql-query and > the "name"-attribute represents the name of the field in the > schema the column should map to. However further on in the > RegexTransformer section there are column-attributes which do not > correspond to the sql-query result set and its the "sourceColName" > attribute which acually represents that data, which comes from the > RegexTransformer I understand but why then is the "column" > attribute used instead of the "name"-attribute. This has confused > me somewhat, any clarification would be greatly appreciated. > DataImportHandler reads by "column" from the resultset and writes by "name" to Solr (or if name is unspecified, by "column"). So column is compulsory but "name" is optional. The typical use-case for a RegexTransformer is when you want to read a field (say "a"), process it (save it as "b") and then add it to Solr (by name "c"). So you read by "sourceColName", process and save it as "column" and write to Solr as "name". So if "name" is unspecified, it will be written to Solr as "column". The reason we use column and not name is because the user may want to do something more with it, for example use that field in a template and save that template to Solr. I know it is a bit confusing but it helps us to keep DIH general enough. Hope that helps. -- Regards, Shalin Shekhar Mangar.
RE: Query regarding Spelling Suggestions
Thanks all for the help and information. Best Regards, ~Mukta -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Tuesday, January 13, 2009 6:50 PM To: solr-user@lucene.apache.org Subject: Re: Query regarding Spelling Suggestions On Tue, Jan 13, 2009 at 5:16 PM, Deshpande, Mukta wrote: > I have following question / observation : (just to ensure that my > configurations are correct) > > The lucene-index (dictionary) "syn_index" is already an index so do we > have to specify the spellcheckIndexDir again? >(If I do not give the spellcheckIndexDir I do not get any > suggestions.) The "syn_index" here is your Lucene index from which you want to use as the source for words. Spell checker processes each token to create n-grams which are then stored into a lucene index at the "spellCheckIndexDir" or in memory. This is why you need to specify both sourceLocation and spellcheckIndexDir. If you do not give spellCheckIndexDir, spell checker will create a Lucene index in-memory, so it should still work. Are you sure you gave a build command before issuing the query? > When I give the build command the spellcheckIndexDir gets populated > reading the "syn_index". Can we avoid this duplication? Spell checker needs a Lucene index to work. It creates a new one and adds tokens after some processing to this index. There is no way to avoid creation of another index at present. However, it should be possible to modify it to store it's fields inside an existing Lucene index (maybe even Solr's own index). Contributions are always welcome :) > If the "sourceLocation" is mandatory when using a third party index > for spelling suggestions, may I update the Solr WIKI to include this > important information. Sure, please go ahead. Thanks! -- Regards, Shalin Shekhar Mangar.
Re: Clustering Carrot2 + Solr
I've updated the patch for trunk. I _believe_ it should now work. -Grant On Jan 8, 2009, at 9:32 AM, Jean-Philip EIMECKE wrote: Thanks for considering my problem Cheers, Jean-Philip Eimecke
What do we mean by Searcher?
Hi, I am somehow new to Solr. While reading through documents/resources, I have come across 'Searcher' term many times. I am able to roughly undestand, that whenever we fire any query, we are actually invoking a searcher. This searcher searches through the index and returns results. But I am not able to fully grasp its meaning. I refered a previous post as well - http://www.nabble.com/what-is-searcher-td15448682.html#a15448682. I have also read through - http://lucene.apache.org/java/2_3_0/api/org/apache/lucene/search/Searcher.html#Searcher() But I am not able fully appreciate it. I want to understand Searcher in a practical scenario - We use Data Import feature of Solr to index database tables. Now, I send a query(*:*) through Solr Admin console for searching. And I get back search result. In this whole process, I have following questions - 1. What is the significance of Searcher in this case? 2. When is Searcher invoked? 3. Who invokes Searher? 4. Where it is Stored? 5. When I send another query (manu:abc), will a new Searcher created? 6. How is searcher auto-warmed in this case? Can anyone please direct me to some tutorial/resource for this? Thanks, Manu -- View this message in context: http://www.nabble.com/What-do-we-mean-by-Searcher--tp21436737p21436737.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Clustering Carrot2 + Solr
Thank you so much Grant Cheers -- Jean-Philip Eimecke jpeime...@gmail.com
prefetching question
Hi, We do have 16 millions of company name and would like to find the way for "prefetching" by using Solr. Does anyone have experience and/or suggestions? Thanks, Jae Joo
Re: DataImportHandler: UTF-8 and Mysql
Shalin Shekhar Mangar wrote: On Mon, Jan 12, 2009 at 3:48 PM, gwk wrote: 1. Posting UTF-8 data through the example post-script works and I get the proper results back when I query using the admin page. However, data imported through the DataImportHandler from a MySQL database (the database contains correct data, it's a copy of a production db and selecting through the client gives the correct characters) I get "ó" instead of "ó". I've tried several combinations of arguments to my datasource url (useUnicode=true&characterEncoding=UTF-8) but it does not seem to help. How do I get this to work correctly? DataImportHandler does not change any encoding. It receives a Java string object from the driver and adds it to Solr. So I'm guessing the problem is in the database or in the driver. Did you create the tables with UTF-8 encoding? Try looking in the MySql driver configuration parameters to force UTF-8. Sorry, I can't be of much help here. I checked again and you were right, while the columns contained utf8-encoded strings, the actual encoding of the columns was set to latin1, I've fixed the database and now it's working correctly. 2. On the wikipage for DataImportHandler, the deletedPkQuery has no real description, am I correct in assuming it should contain a query which returns the ids of items which should be removed from the index? Yes you are right. It should return the primary keys of the rows to be deleted. 3. Another question concerning the DataImportHandler wikipage, I'm not sure about the exact way the field-tag works. From the first data-config.xml example for the full-import I can infer that the "column"-attribute represents the column from the sql-query and the "name"-attribute represents the name of the field in the schema the column should map to. However further on in the RegexTransformer section there are column-attributes which do not correspond to the sql-query result set and its the "sourceColName" attribute which acually represents that data, which comes from the RegexTransformer I understand but why then is the "column" attribute used instead of the "name"-attribute. This has confused me somewhat, any clarification would be greatly appreciated. DataImportHandler reads by "column" from the resultset and writes by "name" to Solr (or if name is unspecified, by "column"). So column is compulsory but "name" is optional. The typical use-case for a RegexTransformer is when you want to read a field (say "a"), process it (save it as "b") and then add it to Solr (by name "c"). So you read by "sourceColName", process and save it as "column" and write to Solr as "name". So if "name" is unspecified, it will be written to Solr as "column". The reason we use column and not name is because the user may want to do something more with it, for example use that field in a template and save that template to Solr. I know it is a bit confusing but it helps us to keep DIH general enough. Hope that helps. Ok, that explains it for me, thanks for the clarification.
Facet Paging
Hi, With the faceting parameters there is an option to add support for paging through a large number of facets. But to create proper paging it would be helpful if the response contains the total number of facets (the amount of facets if facet.limit was set to a negative value) similar to an ordinary query response's numFound attribute so you can determine how many pages there should be. Is it possible to request this information somehow in the same response and if possible how much does it impact performance? Regards, gwk
getting DIH to read my XML files
Hello, I am trying to use DIH with FileListEntityProcessor to to walk the disk and read XML documents. I have a dataConfig.xml as follows:- 0 But when I try and start the walker I get:- INFO: [jdocs] REMOVING ALL DOCUMENTS FROM INDEX Jan 13, 2009 3:38:11 PM org.apache.solr.core.SolrDeletionPolicy onInit INFO: SolrDeletionPolicy.onInit: commits:num=2 commit{dir=/Volumes/spare/ts/solrnightlyj/data/index,segFN=segments_1,version=1231861070710,generation=1,filenames=[segments_1] commit{dir=/Volumes/spare/ts/solrnightlyj/data/index,segFN=segments_2,version=1231861070711,generation=2,filenames=[segments_2] Jan 13, 2009 3:38:11 PM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: last commit = 1231861070711 Jan 13, 2009 3:38:11 PM org.apache.solr.handler.dataimport.DocBuilder buildDocument SEVERE: Exception while processing: jcurrent document : null org.apache.solr.handler.dataimport.DataImportHandlerException: No dataSource :null available for entity :x Processing Document # 1 at org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:287) at org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl.java:86) at org.apache.solr.handler.dataimport.XPathEntityProcessor.init(XPathEntityProcessor.java:78) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:243) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:309) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:137) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:337) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:397) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:378) Jan 13, 2009 3:38:11 PM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: No dataSource :null available for entity :x Processing Document # 1 at org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:287) at org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl.java:86) at org.apache.solr.handler.dataimport.XPathEntityProcessor.init(XPathEntityProcessor.java:78) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:243) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:309) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:137) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:337) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:397) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:378) Anybody able to point out what I have done wrong? Regards Fergus. -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===
Re: getting DIH to read my XML files
Which version of Solr are you using? I think there should be a dataSource="null" in the child entity as well. On Tue, Jan 13, 2009 at 9:28 PM, Fergus McMenemie wrote: > Hello, > > I am trying to use DIH with FileListEntityProcessor to to walk the > disk and read XML documents. I have a dataConfig.xml as follows:- > > > > processor="FileListEntityProcessor" > fileName=".*xml" > newerThan="'NOW-1000DAYS'" > recursive="true" > rootEntity="false" > dataSource="null" > baseDir="/Volumes/spare/ts/j/groups"> > processor="XPathEntityProcessor" > url="${jcurrent.fileAbsolutePath}" > stream="false" > forEach="/record" > transformer="DateFormatTransformer">0 > >xpath="/record/metadata/subje...@qualifier='fullTitle']"/> > >xpath="/record/metadata/subje...@qualifier='publication']"/> > xpath="/record/metadata/subje...@qualifier='pubAbbrev']"/> >xpath="/record/metadata/da...@qualifier='pubDate']"/> > > > > > > > But when I try and start the walker I get:- > > INFO: [jdocs] REMOVING ALL DOCUMENTS FROM INDEX > Jan 13, 2009 3:38:11 PM org.apache.solr.core.SolrDeletionPolicy onInit > INFO: SolrDeletionPolicy.onInit: commits:num=2 > > commit{dir=/Volumes/spare/ts/solrnightlyj/data/index,segFN=segments_1,version=1231861070710,generation=1,filenames=[segments_1] > > commit{dir=/Volumes/spare/ts/solrnightlyj/data/index,segFN=segments_2,version=1231861070711,generation=2,filenames=[segments_2] > Jan 13, 2009 3:38:11 PM org.apache.solr.core.SolrDeletionPolicy > updateCommits > INFO: last commit = 1231861070711 > Jan 13, 2009 3:38:11 PM org.apache.solr.handler.dataimport.DocBuilder > buildDocument > SEVERE: Exception while processing: jcurrent document : null > org.apache.solr.handler.dataimport.DataImportHandlerException: No > dataSource :null available for entity :x Processing Document # 1 > at > org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:287) > at > org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl.java:86) > at > org.apache.solr.handler.dataimport.XPathEntityProcessor.init(XPathEntityProcessor.java:78) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:243) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:309) > at > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179) > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:137) > at > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:337) > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:397) > at > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:378) > Jan 13, 2009 3:38:11 PM org.apache.solr.handler.dataimport.DataImporter > doFullImport > SEVERE: Full Import failed > org.apache.solr.handler.dataimport.DataImportHandlerException: No > dataSource :null available for entity :x Processing Document # 1 > at > org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:287) > at > org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl.java:86) > at > org.apache.solr.handler.dataimport.XPathEntityProcessor.init(XPathEntityProcessor.java:78) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:243) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:309) > at > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179) > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:137) > at > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:337) > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:397) > at > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:378) > > Anybody able to point out what I have done wrong? > > Regards Fergus. > -- > === > Fergus McMenemie > Email:fer...@twig.me.uk > Techmore Ltd Phone:(UK) 07721 376021 > Unix/Mac/Intranets Analyst Programmer > === > -- Regards, Shalin Shekhar Mangar.
Re: getting DIH to read my XML files
Shalin, thanks for the speedy response. >Which version of Solr are you using? Solr Implementation Version: nightly exported - yonik - 2008-11-13 08:05:48 > >I think there should be a dataSource="null" in the child entity as well. OK that had an effect; I now get:- Jan 13, 2009 4:42:28 PM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: last commit = 1231864933487 Jan 13, 2009 4:42:28 PM org.apache.solr.handler.dataimport.DocBuilder buildDocument SEVERE: Exception while processing: janescurrent document : null org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed for xml, url:/Volumes/spare/ts/janes/dtd/janesxml/data/news/jtic/groups/jwit0009.xmlrows processed :0 Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:252) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:177) at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:283) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:309) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:137) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:337) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:397) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:378) Caused by: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:85) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:242) ... 9 more Caused by: java.lang.NullPointerException at com.ctc.wstx.io.ReaderBootstrapper.initialLoad(ReaderBootstrapper.java:245) at com.ctc.wstx.io.ReaderBootstrapper.bootstrapInput(ReaderBootstrapper.java:132) at com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:543) at com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:604) at com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:660) at com.ctc.wstx.stax.WstxInputFactory.createXMLStreamReader(WstxInputFactory.java:331) at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:81) ... 10 more Jan 13, 2009 4:42:28 PM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed for xml, url:/Volumes/spare/ts/janes/dtd/janesxml/data/news/jtic/groups/jwit0009.xmlrows processed :0 Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:252) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:177) at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:283) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:309) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:137) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:337) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:397) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:378) Caused by: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:85) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:242) ... 9 more Caused by: java.lang.NullPointerException at com.ctc.wstx.io.ReaderBootstrapper.initialLoad(ReaderBootstrapper.java:245) at com.ctc.wstx.io.ReaderBootstrapper.bootstrapInput(ReaderBootstrapper.java:132) at com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:543)
Re: What do we mean by Searcher?
Manu, If you truly want to get a better feeling for the notion of a Searcher, my advice is to play with Lucene a little bit first. Do you have a copy of Lucene in Action? You get get a cheaper version online on manning.com/hatcher2 if you want and quickly read a bit about Searcher in one of the early chapters. In short, the searcher is the object/the thing that performs searches against an index. More answers to your questions below. > We use Data Import feature of Solr to index database tables. Now, I send a > query(*:*) through Solr Admin console for searching. And I get back search > result. In this whole process, I have following questions - > 1. What is the significance of Searcher in this case? The searcher is the thing that performed the search. It took your query string, opened an index, ran the search, and got results. > 2. When is Searcher invoked? When you run a search request. > 3. Who invokes Searher? You do, when you call one of the SearchComponents or RequestHandlers, when you run a search request. > 4. Where it is Stored? Searcher is not really "stored". It's a piece of code that runs inside Solr, which runs inside a servlet container, which runs inside a JVM, and so on. > 5. When I send another query (manu:abc), will a new Searcher created? No, the same searcher will be used unless you told Solr to open a new Searcher. > 6. How is searcher auto-warmed in this case? http://wiki.apache.org/solr/?action=fullsearch&context=180&value=autowarm&fullsearch=Text Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Manupriya > To: solr-user@lucene.apache.org > Sent: Tuesday, January 13, 2009 9:25:02 AM > Subject: What do we mean by Searcher? > > > Hi, > > I am somehow new to Solr. While reading through documents/resources, I have > come across 'Searcher' term many times. I am able to roughly undestand, that > whenever we fire any query, we are actually invoking a searcher. This > searcher searches through the index and returns results. > > But I am not able to fully grasp its meaning. I refered a previous post as > well - http://www.nabble.com/what-is-searcher-td15448682.html#a15448682. > > I have also read through - > http://lucene.apache.org/java/2_3_0/api/org/apache/lucene/search/Searcher.html#Searcher() > > > But I am not able fully appreciate it. > > I want to understand Searcher in a practical scenario - > > We use Data Import feature of Solr to index database tables. Now, I send a > query(*:*) through Solr Admin console for searching. And I get back search > result. In this whole process, I have following questions - > 1. What is the significance of Searcher in this case? > 2. When is Searcher invoked? > 3. Who invokes Searher? > 4. Where it is Stored? > 5. When I send another query (manu:abc), will a new Searcher created? > 6. How is searcher auto-warmed in this case? > > Can anyone please direct me to some tutorial/resource for this? > > Thanks, > Manu > -- > View this message in context: > http://www.nabble.com/What-do-we-mean-by-Searcher--tp21436737p21436737.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to achieve combination of features grouping, scoring...
Hi, I don't think you can do any of that with Solr as it exists today. My feeling is that you might want to model this new functionality/code after what's in SOLR-236, even though it's not the same thing as yours, or after the carrot2 plugin. I also have a feeling others might like this functionality, too, so if you can generalize and contribute, please consider doing that. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Norbert Hartl > To: SOLR mailing list > Sent: Tuesday, January 13, 2009 3:19:33 AM > Subject: How to achieve combination of features grouping, scoring... > > Hi, > > I spent some time on solr in order to figure out what > it can do. I sill have some problems finding the right > way to do my search. > > I have a bunch of heterogenous objects that I want to > search. All of these objects belong to an owner. When > a search is issued I like not only to find the individual > objects but the grouped by their owner. > > For grouping I didn't find much valuable other than to > do this with a response writer. I tried collapsing but > this is not what I mean. And facets are still something > different. The only thing is the XSLTResponseWriter that > does grouping of stuff afterwards. > > What is the best way to achieve this: > > - how to group stuff when there are many results to take > into account > - how to score based on grouped objects. To group with > the response writer is not hard. But if I want to do > pagination I like to have the top scored group at the > top of the results. Is there a way to do so? > - I like to only show the fields that match a query. As > someone hinted here on the ML doing this with highlighting > is the only way I found. But then I don't understand that > I can provide a field list (hl.fl) but this does not take > a * for every field like some of the other parameters do. > > Thanks in advance, > > Norbert
Re: prefetching question
Maybe it's just me, but I'm not sure what you mean by "prefetching". (I don't even know if you're talking about an indexing-time activity or a query-time activity.) My guess is that you'll get a more helpful reply if you can make your question more specific. Cheers, Chris On Tue, Jan 13, 2009 at 6:51 AM, Jae Joo wrote: > Hi, > > We do have 16 millions of company name and would like to find the way for > "prefetching" by using Solr. > > Does anyone have experience and/or suggestions? > > Thanks, > > Jae Joo >
RE: Commiting index while time-consuming query is running
I believe that when you commit, a new IndexReader is created, which is warmed, etc. New incoming queries will be sent to this new IndexReader. Once all previously existing queries have been answered, the old IndexReader will shut down. The commit doesn't wait for the query to finish, but it shouldn't impact the results of that query either. What may be impacted is overall system performance while you have 2 IndexReaders in play. There will always be some amount of overlap, but it may be drawn out by the long query. -Todd Feak -Original Message- From: wojtekpia [mailto:wojte...@hotmail.com] Sent: Tuesday, January 13, 2009 2:18 PM To: solr-user@lucene.apache.org Subject: Commiting index while time-consuming query is running Once in a while my Solr instance receives a query that takes a really long time to execute (several minutes or more). What will happen if I update my index (and commit) while one of these really long queries is executing? Will Solr wait for the query to complete before it commits my update? (on a side note, I'm re-working my UI to eliminate these queries) Thanks! -- View this message in context: http://www.nabble.com/Commiting-index-while-time-consuming-query-is-runn ing-tp21445704p21445704.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: non fix highlight snippet size
On 13-Jan-09, at 12:48 AM, Marc Sturlese wrote: Hey there, I need a rule in my highlights that sets for example, the snippet size to 400 in case there's just one snippet, 225 in case two snippeds are found and 125 in case 3 or more snippets are found. Is there any way to do that via solrconfig.xml (for what I have seen don't think so...) or should I code a plugin? In the second case do a I need an extended class of GapFragmenter or what I should hack is in another pice of the source? Thanks in advanced There is no easy way to accomplish that, due to the architecture of the highlighter (which first generates fragments and only then determine whether they are snippets that contain the keyword(s)) -Mike
Indexing the same data in many records
Hi, I'd like to use Solr to index some webserver logs, in order to allow easy ad-hoc querying and analysis. Each Solr Document will represent a single request to the webserver, with fields for time, request URL, referring URL etc. I'm also planning to fetch the page source of each referring URL, and add that as an indexed field in the Solr document. The aim is to allow queries like "find hits to /xyz.html where the referring page contains the word 'foobar'". Since hundreds or even thousands of hits may all come from the same referring page, would this approach be horribly inefficient? (Note the page source won't be stored in each Document, just indexed). Am I going to dramatically increase the index size if I do this? If so, is there a more elegant way to do what I want? Many thanks, Phil -- View this message in context: http://www.nabble.com/Indexing-the-same-data-in-many-records-tp21448465p21448465.html Sent from the Solr - User mailing list archive at Nabble.com.
faceted search returning multiple values for same field
Hi, I am using solr for indexing some product data, and wanted to use the faceted search. My indexed field (mfg) sometimes contains two words "sony erricson" for example. When I get the facets on the mfg, SOLR return "sony" and "erricson" as separate hits. There are also some facets that show up rather mysteriously. My Unique list of mfg's that is indexed is as follows: AT&T BlackBerry? HTC LG Motorola Nokia Option Palm Pantech Samsung Sierra Wireless Sony Ericsson The resulting facets being returned is below: "facet_fields":{ "mfg":[ "ericsson",195, "soni",156, "samsung",155, "nokia",90, "Ericsson",78, "Sony",78, "Samsung",62, "motorola",55, "lg",50, "sony",39, "Nokia",36, "pantech",25, "Motorola",22, "LG",20, "berri",16, "black",16, "blackberri",16, "Pantech",10, "BlackBerry",8, "blackberry",4, "AT",0, "HTC",0, "Option",0, "Palm",0, "Sierra",0, "T",0, "Wireless",0, "at",0, "att",0, "htc",0, "option",0, "palm",0, "sierra",0, "t",0, "wireless",0] I have tried playing around with defining the fieldtype using the following analyzers: Any ideas if its possible to get the same facets as are in the data that's being indexed or would I have to write my own Filter for this purpose ? Thanks Shantanu Deo AT&T eCommerce Web Hosting - Release Management Office: (425)288-6081 email: sd1...@att.com
Re: faceted search returning multiple values for same field
On Wed, Jan 14, 2009 at 8:45 AM, Deo, Shantanu wrote: > > I have tried playing around with defining the fieldtype using the > following analyzers: > positionIncrementGap="100" > > > > > > words="manufacturer.txt"/> > > > > > Any ideas if its possible to get the same facets as are in the data > that's being indexed or would I have to write my own Filter for this > purpose ? Faceting works on the indexed terms. Therefore, you should make sure what you index is exactly as what you stored. You probably need to facet on a "string" type. > > > Thanks > Shantanu Deo > AT&T eCommerce Web Hosting - Release Management > Office: (425)288-6081 > email: sd1...@att.com > -- Regards, Shalin Shekhar Mangar.
Re: faceted search returning multiple values for same field
Shantanu, It sounds like all you have to do is switch to a field type that doesn't tokenize your mfg field. Try field type "string". You'll need to reindex once you make this change. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: "Deo, Shantanu" > To: solr-user@lucene.apache.org > Sent: Tuesday, January 13, 2009 10:15:09 PM > Subject: faceted search returning multiple values for same field > > Hi, > I am using solr for indexing some product data, and wanted to use the > faceted search. My indexed field (mfg) sometimes contains two words > "sony erricson" for example. When I get the facets on the mfg, SOLR > return "sony" and "erricson" as separate hits. There are also some > facets that show up rather mysteriously. > > My Unique list of mfg's that is indexed is as follows: > AT&T > BlackBerry? > HTC > LG > Motorola > Nokia > Option > Palm > Pantech > Samsung > Sierra Wireless > Sony Ericsson > > > The resulting facets being returned is below: > "facet_fields":{ > "mfg":[ > "ericsson",195, > "soni",156, > "samsung",155, > "nokia",90, > "Ericsson",78, > "Sony",78, > "Samsung",62, > "motorola",55, > "lg",50, > "sony",39, > "Nokia",36, > "pantech",25, > "Motorola",22, > "LG",20, > "berri",16, > "black",16, > "blackberri",16, > "Pantech",10, > "BlackBerry",8, > "blackberry",4, > "AT",0, > "HTC",0, > "Option",0, > "Palm",0, > "Sierra",0, > "T",0, > "Wireless",0, > "at",0, > "att",0, > "htc",0, > "option",0, > "palm",0, > "sierra",0, > "t",0, > "wireless",0] > > > I have tried playing around with defining the fieldtype using the > following analyzers: > > positionIncrementGap="100" > > > > > > > words="manufacturer.txt"/> > > > > > Any ideas if its possible to get the same facets as are in the data > that's being indexed or would I have to write my own Filter for this > purpose ? > > Thanks > Shantanu Deo > AT&T eCommerce Web Hosting - Release Management > Office: (425)288-6081 > email: sd1...@att.com
Re: Indexing the same data in many records
Phil, >From what you described so far, I don't see any red flags. I would pay >attention to reading those timestamps (covered on the Wiki and ML archives), >that's all. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: philmccarthy > To: solr-user@lucene.apache.org > Sent: Tuesday, January 13, 2009 8:49:33 PM > Subject: Indexing the same data in many records > > > Hi, > > I'd like to use Solr to index some webserver logs, in order to allow easy > ad-hoc querying and analysis. Each Solr Document will represent a single > request to the webserver, with fields for time, request URL, referring URL > etc. > > I'm also planning to fetch the page source of each referring URL, and add > that as an indexed field in the Solr document. The aim is to allow queries > like "find hits to /xyz.html where the referring page contains the word > 'foobar'". > > Since hundreds or even thousands of hits may all come from the same > referring page, would this approach be horribly inefficient? (Note the page > source won't be stored in each Document, just indexed). Am I going to > dramatically increase the index size if I do this? > > If so, is there a more elegant way to do what I want? > > Many thanks, > Phil > > > > -- > View this message in context: > http://www.nabble.com/Indexing-the-same-data-in-many-records-tp21448465p21448465.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Custom Transformer to handle Timestamp
Thanks Shalin That really helped :handshake: I have created a plugin class and now things are working fine Thanks Again Regards Con Shalin Shekhar Mangar wrote: > > On Tue, Jan 13, 2009 at 12:53 AM, con wrote: > >> >> Hi all >> >> I am using solr to index data from my database. >> In my database there is a timestamp field of which data will be in the >> form >> of, >> 15-09-08 06:28:38.44200 AM. The column is of type TIMESTAMP in the >> oracle db. >> So in the schema.xml i have mentioned as: >> > /> >> >> While indexing data in the debug mode i get this timestamp value as >> >>oracle.sql.TIMESTAMP:oracle.sql.timest...@f536e8 >> >> >> And when i do a searching this value is not displaying while all other >> fields indexed along with it are getting displayed. > > > Hmm, interesting. It seems oracle.sql.TIMESTAMP does not inherit from > java.sql.Timestamp or java.util.Date. This is why DataImportHandler/Solr > cannot make sense out of it and the string representation is being stored > in > the index. > > However it has a toJdbc() method which will return a Jdbc compatible > object. > > http://download-uk.oracle.com/otn_hosted_doc/jdeveloper/904preview/jdbc-javadoc/oracle/sql/TIMESTAMP.html#toJdbc() > > >> 1) So do i need to write a custom transformer to add these values to the >> index. > > > Yes, it seems like that is the only way. > > >> 2)And if yes I am confused how it is? Is there a sample code somewhere? > > > Yes, see an example here -- > http://wiki.apache.org/solr/DIHCustomTransformer > > >> >> I have tried the sample TrimTransformer and it is working. But can i >> convert >> this string to a valid date format.(I am not a java expert..:-( )? > > > I would start by trying something like this: > > oracle.jdbc.TIMESTAMP timestamp = (oracle.jdbc.TIMESTAMP) > row.get("your_timestamp_field_name"); > row.put("your_timestamp_field_name", timestamp.toJdbc()); > > >> >> -- >> View this message in context: >> http://www.nabble.com/Custom-Transformer-to-handle-Timestamp-tp21421742p21421742.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > -- > Regards, > Shalin Shekhar Mangar. > > -- View this message in context: http://www.nabble.com/Custom-Transformer-to-handle-Timestamp-tp21421742p21450404.html Sent from the Solr - User mailing list archive at Nabble.com.
CommonsHttpSolrServer in multithreaded env
Hi all, Is it safe to use a single instance of CommonsHttpSolrServer object in multithreaded environment? I am having multiple threads that are accessing single CommonsHttpSolrServer static object but sometimes the application gets blocked. Following is the stacktrace printed for all threads "indexthread1" Id=47 prio=5 RUNNABLE (in native) Blocked (cnt): 5853; Waited (cnt): 30 at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) - locked java.io.bufferedinputstr...@147d387 at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.jav a:1116) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpCon nectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413) at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBa se.java:1973) at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase .java:1735) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java :1098) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMe thodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMetho dDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:3 97) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:3 23) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsH ttpSolrServer.java:335) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsH ttpSolrServer.java:183) at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest .java:217) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:85) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:74) at wt.index.SolrIndexDelegate.index(SolrIndexDelegate.java:84)
Searchable and Non Searchable Fields
Hi All I am using dataimporthandler to index values from oracle db. My sample rows are like: 1) FirstName-> George,LastName-> Bush, Country-> US 2) FirstName-> Georgeon, LastName-> Washington, Country-> US 3) FirstName-> Tony, LastName-> George, Country-> UK 4) FirstName-> Gordon,LastName-> Brown,Country-> UK 5) FirstName-> Vladimer, LastName-> Putin, Country-> Russia How can i set only the FirstName field as searchable. For eg. if I search George, I should get FirstName, LastName and Country of first and second rows only, and if I search Bush no value should be returned. I tried by providing various options for the at schema.xml But it is not providing the exact results. How can I change the field attributes to get this result? Or is there someother configs for this? Expecting reply Thanks in advance con -- View this message in context: http://www.nabble.com/Searchable-and-Non-Searchable-Fields-tp21450664p21450664.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Issue in Facet on date field
: I have to create two facets on a date field: : 1) First Facet will have results between two date range , i.e. [NOW TO : NOW+45DAYS] : 2) Second Facet will have results between two date range , i.e. [NOW-45DAYS : TO NOW] the date faceting code is designed to generate counts for regular intervals of times (specified by "gap") between a fixed start and end. you could probably get what you want with something like... facet.date.start = NOW-45DAYS facet.date.end = NOW+45DAYS facet.date.gap = +45DAYS ...but to be perfectly honest, if you know you want exactly two counts, one for hte last 45 days and one for the next 45 days, then date faceting is overkill (and overly complicated) for your use case ... just use facet queries... facet.query=productPublicationDate_product_dt:[NOW-45DAYS TO NOW] facet.query=productPublicationDate_product_dt:[NOW TO NOW+45DAYS] BTW: you'll probably want to replace "NOW" with "NOW/DAY" or "NOW/HOUR" to round down and get better cache utilization. -Hoss