How to achieve combination of features grouping, scoring...

2009-01-13 Thread Norbert Hartl
Hi,

I spent some time on solr in order to figure out what
it can do. I sill have some problems finding the right
way to do my search.

I have a bunch of heterogenous objects that I want to
search. All of these objects belong to an owner. When
a search is issued I like not only to find the individual
objects but the grouped by their owner.

For grouping I didn't find much valuable other than to
do this with a response writer. I tried collapsing but
this is not what I mean. And facets are still something
different. The only thing is the XSLTResponseWriter that
does grouping of stuff afterwards. 

What is the best way to achieve this:

- how to group stuff when there are many results to take
  into account
- how to score based on grouped objects. To group with
  the response writer is not hard. But if I want to do
  pagination I like to have the top scored group at the
  top of the results. Is there a way to do so?
- I like to only show the fields that match a query. As 
  someone hinted here on the ML doing this with highlighting
  is the only way I found. But then I don't understand that
  I can provide a field list (hl.fl) but this does not take
  a * for every field like some of the other parameters do.

Thanks in advance,

Norbert



Issue in Facet on date field

2009-01-13 Thread prerna07

Hi,

I have to create two facets on a date field:
1) First Facet will have results between two date range , i.e. [NOW TO
NOW+45DAYS]
2) Second Facet will have results between two date range , i.e. [NOW-45DAYS
TO NOW]

I want both results in a single query. The query i am using is mentioned
below :

&facet=true&facet.date=productPublicationDate_product_dt&f.productPublicationDate_product_dt.facet.date.start=NOW&f.productPublicationDate_product_dt.facet.date.end=NOW+45DAYS&f.productPublicationDate_product_dt.facet.date.gap=%2B45DAYS&facet.date=productPublicationDate_product_dt&f.productPublicationDate_product_dt.facet.date.start=NOW-45DAYS&f.productPublicationDate_product_dt.facet.date.end=NOW&f.productPublicationDate_product_dt.facet.date.gap=%2B45DAYS

ISSUE:
I am getting same response in two nodes, one query is overriding the
response of second facet:

- 
   
   
- 
- 
  0 
  +45DAYS 
  2009-02-27T08:37:26.662Z 
  
- 
  0 
  +45DAYS 
  2009-02-27T08:37:26.662Z 
  
  
  

Please suggest the way by which i can differentiate these two facet.field in
the query ?
-- 
View this message in context: 
http://www.nabble.com/Issue-in-Facet-on-date-field-tp21431422p21431422.html
Sent from the Solr - User mailing list archive at Nabble.com.



non fix highlight snippet size

2009-01-13 Thread Marc Sturlese

Hey there,
I need a rule in my highlights that sets for example, the snippet size to
400 in case there's just one snippet, 225 in case two snippeds are found and
125 in case 3 or more snippets are found. Is there any way to do that via
solrconfig.xml (for what I have seen don't think so...) or should I code a
plugin? In the second case do a I need an extended class of GapFragmenter or
what I should hack is in another pice of the source?
Thanks in advanced
-- 
View this message in context: 
http://www.nabble.com/non-fix-highlight-snippet-size-tp21431456p21431456.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Issue in Facet on date field

2009-01-13 Thread Marc Sturlese

Hey,
That's because Solr just looks for one start,end and gap params in
solrconfig.xml. It just allows you to do datefaceting for differents fields
but just in one range period. 
I was in the same situation as you are, what I did was modify the function
getFacetDateCounts() from Simplefacets.class to make it get as match params
(stard/end/gap) as I want. Once it's done I do date faceting in all time
periods.
Result would look like:



2238
+3MONTH
2009-01-13T00:00:00Z
3822
+6MONTH
2009-01-13T00:00:00Z
3864
+1YEAR
2009-01-13T00:00:00Z



Doing facets for the last year, 6 month and 3 month.
I don't think there's a way to do that without modifiying the source (if you
find it let me know :D)



prerna07 wrote:
> 
> Hi,
> 
> I have to create two facets on a date field:
> 1) First Facet will have results between two date range , i.e. [NOW TO
> NOW+45DAYS]
> 2) Second Facet will have results between two date range , i.e.
> [NOW-45DAYS TO NOW]
> 
> I want both results in a single query. The query i am using is mentioned
> below :
> 
> &facet=true&facet.date=productPublicationDate_product_dt&f.productPublicationDate_product_dt.facet.date.start=NOW&f.productPublicationDate_product_dt.facet.date.end=NOW+45DAYS&f.productPublicationDate_product_dt.facet.date.gap=%2B45DAYS&facet.date=productPublicationDate_product_dt&f.productPublicationDate_product_dt.facet.date.start=NOW-45DAYS&f.productPublicationDate_product_dt.facet.date.end=NOW&f.productPublicationDate_product_dt.facet.date.gap=%2B45DAYS
> 
> ISSUE:
> I am getting same response in two nodes, one query is overriding the
> response of second facet:
> 
> - 
>
>
> - 
> - 
>   0 
>   +45DAYS 
>   2009-02-27T08:37:26.662Z 
>   
> - 
>   0 
>   +45DAYS 
>   2009-02-27T08:37:26.662Z 
>   
>   
>   
> 
> Please suggest the way by which i can differentiate these two facet.field
> in the query ?
> 

-- 
View this message in context: 
http://www.nabble.com/Issue-in-Facet-on-date-field-tp21431422p21431727.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Issue in Facet on date field

2009-01-13 Thread prerna07


There can be two other options:
1) To make 2 solr queries to get two facets
2) Use copy field of schema.xml.

Thanks,
Prerna


Marc Sturlese wrote:
> 
> Hey,
> That's because Solr just looks for one start,end and gap params in
> solrconfig.xml. It just allows you to do datefaceting for differents
> fields but just in one range period. 
> I was in the same situation as you are, what I did was modify the function
> getFacetDateCounts() from Simplefacets.class to make it get as match
> params (stard/end/gap) as I want. Once it's done I do date faceting in all
> time periods.
> Result would look like:
> 
> 
> 
> 2238
> +3MONTH
> 2009-01-13T00:00:00Z
> 3822
> +6MONTH
> 2009-01-13T00:00:00Z
> 3864
> +1YEAR
> 2009-01-13T00:00:00Z
> 
> 
> 
> Doing facets for the last year, 6 month and 3 month.
> I don't think there's a way to do that without modifiying the source (if
> you find it let me know :D)
> 
> 
> 
> prerna07 wrote:
>> 
>> Hi,
>> 
>> I have to create two facets on a date field:
>> 1) First Facet will have results between two date range , i.e. [NOW TO
>> NOW+45DAYS]
>> 2) Second Facet will have results between two date range , i.e.
>> [NOW-45DAYS TO NOW]
>> 
>> I want both results in a single query. The query i am using is mentioned
>> below :
>> 
>> &facet=true&facet.date=productPublicationDate_product_dt&f.productPublicationDate_product_dt.facet.date.start=NOW&f.productPublicationDate_product_dt.facet.date.end=NOW+45DAYS&f.productPublicationDate_product_dt.facet.date.gap=%2B45DAYS&facet.date=productPublicationDate_product_dt&f.productPublicationDate_product_dt.facet.date.start=NOW-45DAYS&f.productPublicationDate_product_dt.facet.date.end=NOW&f.productPublicationDate_product_dt.facet.date.gap=%2B45DAYS
>> 
>> ISSUE:
>> I am getting same response in two nodes, one query is overriding the
>> response of second facet:
>> 
>> - 
>>
>>
>> - 
>> - 
>>   0 
>>   +45DAYS 
>>   2009-02-27T08:37:26.662Z 
>>   
>> - 
>>   0 
>>   +45DAYS 
>>   2009-02-27T08:37:26.662Z 
>>   
>>   
>>   
>> 
>> Please suggest the way by which i can differentiate these two facet.field
>> in the query ?
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Issue-in-Facet-on-date-field-tp21431422p21431934.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Getting only fields that match

2009-01-13 Thread Norbert Hartl
On Sun, 2009-01-11 at 17:07 +0530, Shalin Shekhar Mangar wrote:
> On Sun, Jan 11, 2009 at 4:02 PM, Norbert Hartl  wrote:
> 
> >
> > I like the search result to include only the fields
> > that matched the search. Is this possible? I only
> > saw the field spec where you can have a certain set
> > of fields or all.
> 
> 
> Are you looking for highlighting (snippets)?
> 
> http://wiki.apache.org/solr/HighlightingParameters
> 
> A Field can be indexed (searchable) or stored (retrievable) or both. When
> you make a query to Solr, you yourself specify which fields it needs to
> search on. If they are stored, you can ask to retrieve those fields only.
> Not sure if that answers your question.
> 
Having another look on your proposal I can see you might
be right :) Seems to me to be most doable approach by now, too.

thanks,

Norbert



solrj delete by Id problem

2009-01-13 Thread Parisa

I have a problem with solrj delete By Id . If I search a keyword and it has
more than 1 result no (for example 7) then I delete on of the resulted doc
with solrj (server.deleteById  ) , I search this keyword again  the result
no is zero . and it 's not correct because it should be 6 . It should shows
the other 6 docs.


I should mention that when I restart the server  the result will be correct
and it shows the correct result no (I mean 6 docs).

besides, It has the problem with the keywords that we have searched before
deleting the docs and it has not problem with new key words.

-- 
View this message in context: 
http://www.nabble.com/solrj-delete-by-Id-problem-tp21433056p21433056.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Query regarding Spelling Suggestions

2009-01-13 Thread Deshpande, Mukta
Hi Grant,

My spellcheck is now working fine with the following configuration:


   
  word
  solr.IndexBasedSpellChecker
  word
  UTF-8
  d:\solr-tomcat\solr\data\syn_index
  ./spellcheckerFile1
   
 

Earlier I configured the lucene-index (dictionary) "syn_index" to the
spellcheckIndexDir as interpreted from the wiki page.
Then I was looking into the file IndexBasedSpellChecker.java and found
the usage of "sourceLocation". 
When I configured my lucene-index (dictionary) "syn_index" as
"sourceLocation" the IndexBasedSpellChecker worked.

I have following question / observation : (just to ensure that my
configurations are correct)

The lucene-index (dictionary) "syn_index" is already an index so do we
have to specify the spellcheckIndexDir again?
(If I do not give the spellcheckIndexDir I do not get any
suggestions.)
When I give the build command the spellcheckIndexDir gets populated
reading the "syn_index". Can we avoid this duplication?

If the "sourceLocation" is mandatory when using a third party index for
spelling suggestions, may I update the Solr WIKI to include this
important information.

Thanks & Best Regards,
~Mukta

-Original Message-
From: Grant Ingersoll [mailto:gsing...@apache.org] 
Sent: Monday, January 12, 2009 10:15 PM
To: solr-user@lucene.apache.org
Subject: Re: Query regarding Spelling Suggestions

Solr 1.3 doesn't use Log4J, it uses Java Utility Logging (JUL).  I
believe the info level in the logs is sufficient.  Let's start by
posting what you have?

Also, are you able to get the sample spellchecking to work?

On Jan 12, 2009, at 2:16 AM, Deshpande, Mukta wrote:

> Hi,
>
> Could you please send me the needful entries in log4j.properties to 
> enable logging, explicitly for SpellCheckComponent.
>
> My current log4j.properties looks like:
>
> log4j.rootLogger=INFO,console
> log4j.appender.console=org.apache.log4j.ConsoleAppender
> log4j.appender.console.target=System.err
> log4j.appender.console.layout=org.apache.log4j.PatternLayout
> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd
> HH:mm:ss} %p
> %c{2}: %m%n
> log4j.logger.org.apache.solr=DEBUG
>
> With these settings I can only see the INFO level logs.
>
> I tried to change the log level for SpellCheckComponent to "FINE"  
> using
> the admin logging page http://localhost:8080/solr/admin/logging but 
> did not see any difference in logging.
>
> Thanks,
> ~Mukta
>
> -Original Message-
> From: Grant Ingersoll [mailto:gsing...@apache.org]
> Sent: Monday, January 12, 2009 3:22 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Query regarding Spelling Suggestions
>
> Can you send the full log?
>
> On Jan 11, 2009, at 1:51 PM, Deshpande, Mukta wrote:
>
>> I am using the example schema that comes with the Solr installation 
>> downloaded from http://www.mirrorgeek.com/apache.org/lucene/solr/.
>> I have added the "word"  field with "textSpell" fieldtype in the 
>> schema.xml file, as specified in the below mail.
>>
>> My spelling index exist under /data/ If I open my index in

>> Luke  I can see the entries against "word"
>> field.
>>
>> Thanks,
>> ~Mukta
>>
>>
>> 
>>
>> From: Grant Ingersoll [mailto:gsing...@apache.org]
>> Sent: Fri 1/9/2009 8:29 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Query regarding Spelling Suggestions
>>
>>
>>
>> Can you put the full log (as short as possibly demonstrates the
>> problem) somewhere where I can take a look?  Likewise, can you share 
>> your schema?
>>
>> Also, does the spelling index exist under /data/index?  If

>> you open it w/ Luke, does it have entries?
>>
>> Thanks,
>> Grant
>>
>> On Jan 8, 2009, at 11:30 PM, Deshpande, Mukta wrote:
>>
>>>
>>> Yes. I send the build command as:
>>> http://localhost:8080/solr/select/? 
>>> q=documnet&spellcheck=true&spellch
>>> eck
>>> .build
>>> =true&spellcheck.count=2&spellcheck.q=parfect&spellcheck.dictionar
>>> y=dict
>>>
>>> The Tomcat log shows:
>>> Jan 9, 2009 9:55:19 AM org.apache.solr.core.SolrCore execute
>>> INFO: [] webapp=/solr path=/select/
>>> params
>>> ={spellcheck=true&q=documnet&spellcheck.q=parfect&spellcheck.dicti
>>> onary=dict&spellcheck.count=2&spellcheck.build=true} hits=0 status=0
>>> QTime=141
>>>
>>> Even after sending the build command I do not get any suggestions.
>>> Can you please check.
>>>
>>> Thanks,
>>> ~Mukta
>>>
>>> -Original Message-
>>> From: Grant Ingersoll [mailto:gsing...@apache.org]
>>> Sent: Thursday, January 08, 2009 7:42 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Query regarding Spelling Suggestions
>>>
>>> Did you send in the build command?  See 
>>> http://wiki.apache.org/solr/SpellCheckComponent
>>>
>>> On Jan 8, 2009, at 5:14 AM, Deshpande, Mukta wrote:
>>>
 Hi,

 I am using Wordnet dictionary for spelling suggestions.

 The dictionary is converted to Solr index  with only one field 
 "word"
 and stored in location /data/syn_index, using 
 syns2Index.java program

Re: solrj delete by Id problem

2009-01-13 Thread Shalin Shekhar Mangar
Did you call commit after the delete?

On Tue, Jan 13, 2009 at 4:12 PM, Parisa  wrote:

>
> I have a problem with solrj delete By Id . If I search a keyword and it has
> more than 1 result no (for example 7) then I delete on of the resulted doc
> with solrj (server.deleteById  ) , I search this keyword again  the result
> no is zero . and it 's not correct because it should be 6 . It should shows
> the other 6 docs.
>
>
> I should mention that when I restart the server  the result will be correct
> and it shows the correct result no (I mean 6 docs).
>
> besides, It has the problem with the keywords that we have searched before
> deleting the docs and it has not problem with new key words.
>
> --
> View this message in context:
> http://www.nabble.com/solrj-delete-by-Id-problem-tp21433056p21433056.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Regards,
Shalin Shekhar Mangar.


Re: Query regarding Spelling Suggestions

2009-01-13 Thread Shalin Shekhar Mangar
On Tue, Jan 13, 2009 at 5:16 PM, Deshpande, Mukta  wrote:

> I have following question / observation : (just to ensure that my
> configurations are correct)
>
> The lucene-index (dictionary) "syn_index" is already an index so do we
> have to specify the spellcheckIndexDir again?
>(If I do not give the spellcheckIndexDir I do not get any
> suggestions.)


The "syn_index" here is your Lucene index from which you want to use as the
source for words. Spell checker processes each token to create n-grams which
are then stored into a lucene index at the "spellCheckIndexDir" or in
memory. This is why you need to specify both sourceLocation and
spellcheckIndexDir.

If you do not give spellCheckIndexDir, spell checker will create a Lucene
index in-memory, so it should still work. Are you sure you gave a build
command before issuing the query?


> When I give the build command the spellcheckIndexDir gets populated
> reading the "syn_index". Can we avoid this duplication?


Spell checker needs a Lucene index to work. It creates a new one and adds
tokens after some processing to this index. There is no way to avoid
creation of another index at present.

However, it should be possible to modify it to store it's fields inside an
existing Lucene index (maybe even Solr's own index). Contributions are
always welcome :)


> If the "sourceLocation" is mandatory when using a third party index for
> spelling suggestions, may I update the Solr WIKI to include this
> important information.


Sure, please go ahead. Thanks!

-- 
Regards,
Shalin Shekhar Mangar.


Re: Custom Transformer to handle Timestamp

2009-01-13 Thread Shalin Shekhar Mangar
On Tue, Jan 13, 2009 at 12:53 AM, con  wrote:

>
> Hi all
>
> I am using solr to index data from my database.
> In my database there is a timestamp field of which data will be in the form
> of,
> 15-09-08 06:28:38.44200 AM. The column is of type TIMESTAMP in the
> oracle db.
> So in the schema.xml  i have mentioned as:
>   
>
> While indexing data in the debug mode i get this timestamp  value as
> 
>oracle.sql.TIMESTAMP:oracle.sql.timest...@f536e8
> 
>
> And when i do a searching this value is not displaying while all other
> fields indexed along with it are getting displayed.


Hmm, interesting. It seems oracle.sql.TIMESTAMP does not inherit from
java.sql.Timestamp or java.util.Date. This is why DataImportHandler/Solr
cannot make sense out of it and the string representation is being stored in
the index.

However it has a toJdbc() method which will return a Jdbc compatible object.

http://download-uk.oracle.com/otn_hosted_doc/jdeveloper/904preview/jdbc-javadoc/oracle/sql/TIMESTAMP.html#toJdbc()


> 1) So do i need to write a custom transformer to add these values to the
> index.


Yes, it seems like that is the only way.


> 2)And if yes I am confused how it is? Is there a sample code somewhere?


Yes, see an example here -- http://wiki.apache.org/solr/DIHCustomTransformer


>
> I have tried the sample TrimTransformer and it is working. But can i
> convert
> this string to a valid date format.(I am not a java expert..:-( )?


I would start by trying something like this:

oracle.jdbc.TIMESTAMP timestamp = (oracle.jdbc.TIMESTAMP)
row.get("your_timestamp_field_name");
row.put("your_timestamp_field_name", timestamp.toJdbc());


>
> --
> View this message in context:
> http://www.nabble.com/Custom-Transformer-to-handle-Timestamp-tp21421742p21421742.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Regards,
Shalin Shekhar Mangar.


Re: solrj delete by Id problem

2009-01-13 Thread Parisa


Shalin Shekhar Mangar wrote:
> 
> Did you call commit after the delete?
> 
> Ofcourse I call commit and I test both commit(false,false) and
> commit(true,true) in both cases the result is the same. 
> 
> On Tue, Jan 13, 2009 at 4:12 PM, Parisa  wrote:
> 
>>
>> I have a problem with solrj delete By Id . If I search a keyword and it
>> has
>> more than 1 result no (for example 7) then I delete on of the resulted
>> doc
>> with solrj (server.deleteById  ) , I search this keyword again  the
>> result
>> no is zero . and it 's not correct because it should be 6 . It should
>> shows
>> the other 6 docs.
>>
>>
>> I should mention that when I restart the server  the result will be
>> correct
>> and it shows the correct result no (I mean 6 docs).
>>
>> besides, It has the problem with the keywords that we have searched
>> before
>> deleting the docs and it has not problem with new key words.
>>
>> --
>> View this message in context:
>> http://www.nabble.com/solrj-delete-by-Id-problem-tp21433056p21433056.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/solrj-delete-by-Id-problem-tp21433056p21435839.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DataImportHandler: UTF-8 and Mysql

2009-01-13 Thread Shalin Shekhar Mangar
On Mon, Jan 12, 2009 at 3:48 PM, gwk  wrote:

> 1. Posting UTF-8 data through the example post-script works and I get
> the proper results back when I query using the admin page.
> However, data imported through the DataImportHandler from a MySQL
> database (the database contains correct data, it's a copy of a
> production db and selecting through the client gives the correct
> characters) I get "ó" instead of "ó". I've tried several
> combinations of arguments to my datasource url
> (useUnicode=true&characterEncoding=UTF-8) but it does not seem to
> help. How do I get this to work correctly?


DataImportHandler does not change any encoding. It receives a Java string
object from the driver and adds it to Solr. So I'm guessing the problem is
in the database or in the driver. Did you create the tables with UTF-8
encoding? Try looking in the MySql driver configuration parameters to force
UTF-8. Sorry, I can't be of much help here.


> 2. On the wikipage for DataImportHandler, the deletedPkQuery has no
> real description, am I correct in assuming it should contain a
> query which returns the ids of items which should be removed from
> the index?


Yes you are right. It should return the primary keys of the rows to be
deleted.


>
>  3. Another question concerning the DataImportHandler wikipage, I'm
> not sure about the exact way the field-tag works. From the first
> data-config.xml example for the full-import I can infer that the
> "column"-attribute represents the column from the sql-query and
> the "name"-attribute represents the name of the field in the
> schema the column should map to. However further on in the
> RegexTransformer section there are column-attributes which do not
> correspond to the sql-query result set and its the "sourceColName"
> attribute which acually represents that data, which comes from the
> RegexTransformer I understand but why then is the "column"
> attribute used instead of the "name"-attribute. This has confused
> me somewhat, any clarification would be greatly appreciated.
>

DataImportHandler reads by "column" from the resultset and writes by "name"
to Solr (or if name is unspecified, by "column"). So column is compulsory
but "name" is optional.

The typical use-case for a RegexTransformer is when you want to read a field
(say "a"), process it (save it as "b") and then add it to Solr (by name
"c").

So you read by "sourceColName", process and save it as "column" and write to
Solr as "name". So if "name" is unspecified, it will be written to Solr as
"column". The reason we use column and not name is because the user may want
to do something more with it, for example use that field in a template and
save that template to Solr. I know it is a bit confusing but it helps us to
keep DIH general enough.

Hope that helps.

-- 
Regards,
Shalin Shekhar Mangar.


RE: Query regarding Spelling Suggestions

2009-01-13 Thread Deshpande, Mukta
Thanks all for the help and information.

Best Regards,
~Mukta

-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: Tuesday, January 13, 2009 6:50 PM
To: solr-user@lucene.apache.org
Subject: Re: Query regarding Spelling Suggestions

On Tue, Jan 13, 2009 at 5:16 PM, Deshpande, Mukta 
wrote:

> I have following question / observation : (just to ensure that my 
> configurations are correct)
>
> The lucene-index (dictionary) "syn_index" is already an index so do we

> have to specify the spellcheckIndexDir again?
>(If I do not give the spellcheckIndexDir I do not get any
> suggestions.)


The "syn_index" here is your Lucene index from which you want to use as
the source for words. Spell checker processes each token to create
n-grams which are then stored into a lucene index at the
"spellCheckIndexDir" or in memory. This is why you need to specify both
sourceLocation and spellcheckIndexDir.

If you do not give spellCheckIndexDir, spell checker will create a
Lucene index in-memory, so it should still work. Are you sure you gave a
build command before issuing the query?


> When I give the build command the spellcheckIndexDir gets populated 
> reading the "syn_index". Can we avoid this duplication?


Spell checker needs a Lucene index to work. It creates a new one and
adds tokens after some processing to this index. There is no way to
avoid creation of another index at present.

However, it should be possible to modify it to store it's fields inside
an existing Lucene index (maybe even Solr's own index). Contributions
are always welcome :)


> If the "sourceLocation" is mandatory when using a third party index 
> for spelling suggestions, may I update the Solr WIKI to include this 
> important information.


Sure, please go ahead. Thanks!

--
Regards,
Shalin Shekhar Mangar.


Re: Clustering Carrot2 + Solr

2009-01-13 Thread Grant Ingersoll

I've updated the patch for trunk.  I _believe_ it should now work.

-Grant

On Jan 8, 2009, at 9:32 AM, Jean-Philip EIMECKE wrote:


Thanks for considering my problem

Cheers,
Jean-Philip Eimecke





What do we mean by Searcher?

2009-01-13 Thread Manupriya

Hi,

I am somehow new to Solr. While reading through documents/resources, I have
come across 'Searcher' term many times. I am able to roughly undestand, that
whenever we fire any query, we are actually invoking a searcher. This
searcher searches through the index and returns results.

But I am not able to fully grasp its meaning. I refered a previous post as
well - http://www.nabble.com/what-is-searcher-td15448682.html#a15448682.

I have also read through -
http://lucene.apache.org/java/2_3_0/api/org/apache/lucene/search/Searcher.html#Searcher()
 
But I am not able fully appreciate it.

I want to understand Searcher in a practical scenario - 

We use Data Import feature of Solr to index database tables. Now, I send a
query(*:*) through Solr Admin console for searching. And I get back search
result. In this whole process, I have following questions - 
1. What is the significance of Searcher in this case?
2. When is Searcher invoked?
3. Who invokes Searher?
4. Where it is Stored?
5. When I send another query (manu:abc), will a new Searcher created?
6. How is searcher auto-warmed in this case?

Can anyone please direct me to some tutorial/resource for this?

Thanks,
Manu
-- 
View this message in context: 
http://www.nabble.com/What-do-we-mean-by-Searcher--tp21436737p21436737.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Clustering Carrot2 + Solr

2009-01-13 Thread Jean-Philip EIMECKE
Thank you so much Grant

Cheers

-- 
Jean-Philip Eimecke
jpeime...@gmail.com


prefetching question

2009-01-13 Thread Jae Joo
Hi,

We do have 16 millions of company name and would like to find the way for
"prefetching" by using Solr.

Does anyone have experience and/or suggestions?

Thanks,

Jae Joo


Re: DataImportHandler: UTF-8 and Mysql

2009-01-13 Thread gwk

Shalin Shekhar Mangar wrote:

On Mon, Jan 12, 2009 at 3:48 PM, gwk  wrote:

  

1. Posting UTF-8 data through the example post-script works and I get
the proper results back when I query using the admin page.
However, data imported through the DataImportHandler from a MySQL
database (the database contains correct data, it's a copy of a
production db and selecting through the client gives the correct
characters) I get "ó" instead of "ó". I've tried several
combinations of arguments to my datasource url
(useUnicode=true&characterEncoding=UTF-8) but it does not seem to
help. How do I get this to work correctly?




DataImportHandler does not change any encoding. It receives a Java string
object from the driver and adds it to Solr. So I'm guessing the problem is
in the database or in the driver. Did you create the tables with UTF-8
encoding? Try looking in the MySql driver configuration parameters to force
UTF-8. Sorry, I can't be of much help here.


  
I checked again and you were right, while the columns contained 
utf8-encoded strings, the actual encoding of the columns was set to 
latin1, I've fixed the database and now it's working correctly.

2. On the wikipage for DataImportHandler, the deletedPkQuery has no
real description, am I correct in assuming it should contain a
query which returns the ids of items which should be removed from
the index?




Yes you are right. It should return the primary keys of the rows to be
deleted.


  

 3. Another question concerning the DataImportHandler wikipage, I'm
not sure about the exact way the field-tag works. From the first
data-config.xml example for the full-import I can infer that the
"column"-attribute represents the column from the sql-query and
the "name"-attribute represents the name of the field in the
schema the column should map to. However further on in the
RegexTransformer section there are column-attributes which do not
correspond to the sql-query result set and its the "sourceColName"
attribute which acually represents that data, which comes from the
RegexTransformer I understand but why then is the "column"
attribute used instead of the "name"-attribute. This has confused
me somewhat, any clarification would be greatly appreciated.




DataImportHandler reads by "column" from the resultset and writes by "name"
to Solr (or if name is unspecified, by "column"). So column is compulsory
but "name" is optional.

The typical use-case for a RegexTransformer is when you want to read a field
(say "a"), process it (save it as "b") and then add it to Solr (by name
"c").

So you read by "sourceColName", process and save it as "column" and write to
Solr as "name". So if "name" is unspecified, it will be written to Solr as
"column". The reason we use column and not name is because the user may want
to do something more with it, for example use that field in a template and
save that template to Solr. I know it is a bit confusing but it helps us to
keep DIH general enough.

Hope that helps.

  


Ok, that explains it for me, thanks for the clarification.




Facet Paging

2009-01-13 Thread gwk

Hi,

With the faceting parameters there is an option to add support for 
paging through a large number of facets. But to create proper paging it 
would be helpful if the response contains the total number of facets 
(the amount of facets if facet.limit was set to a negative value) 
similar to an ordinary query response's numFound attribute so you can 
determine how many pages there should be. Is it possible to request this 
information somehow in the same response and if possible how much does 
it impact performance?


Regards,

gwk


getting DIH to read my XML files

2009-01-13 Thread Fergus McMenemie
Hello,

I am trying to use DIH with FileListEntityProcessor to to walk the
disk and read XML documents. I have a dataConfig.xml as follows:-

   

   
   0
   
   
   
   
   
   
   
   
   
   


But when I try and start the walker I get:-

   INFO: [jdocs] REMOVING ALL DOCUMENTS FROM INDEX
   Jan 13, 2009 3:38:11 PM org.apache.solr.core.SolrDeletionPolicy onInit
   INFO: SolrDeletionPolicy.onInit: commits:num=2
   
commit{dir=/Volumes/spare/ts/solrnightlyj/data/index,segFN=segments_1,version=1231861070710,generation=1,filenames=[segments_1]
   
commit{dir=/Volumes/spare/ts/solrnightlyj/data/index,segFN=segments_2,version=1231861070711,generation=2,filenames=[segments_2]
   Jan 13, 2009 3:38:11 PM org.apache.solr.core.SolrDeletionPolicy updateCommits
   INFO: last commit = 1231861070711
   Jan 13, 2009 3:38:11 PM org.apache.solr.handler.dataimport.DocBuilder 
buildDocument
   SEVERE: Exception while processing: jcurrent document : null
   org.apache.solr.handler.dataimport.DataImportHandlerException: No dataSource 
:null available for entity :x Processing Document # 1
   at 
org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:287)
   at 
org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl.java:86)
   at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.init(XPathEntityProcessor.java:78)
   at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:243)
   at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:309)
   at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179)
   at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:137)
   at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:337)
   at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:397)
   at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:378)
   Jan 13, 2009 3:38:11 PM org.apache.solr.handler.dataimport.DataImporter 
doFullImport
   SEVERE: Full Import failed
   org.apache.solr.handler.dataimport.DataImportHandlerException: No dataSource 
:null available for entity :x Processing Document # 1
   at 
org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:287)
   at 
org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl.java:86)
   at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.init(XPathEntityProcessor.java:78)
   at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:243)
   at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:309)
   at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179)
   at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:137)
   at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:337)
   at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:397)
   at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:378)

Anybody able to point out what I have done wrong?

Regards Fergus.
-- 
===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021
Unix/Mac/Intranets Analyst Programmer
===


Re: getting DIH to read my XML files

2009-01-13 Thread Shalin Shekhar Mangar
Which version of Solr are you using?

I think there should be a dataSource="null" in the child entity as well.

On Tue, Jan 13, 2009 at 9:28 PM, Fergus McMenemie  wrote:

> Hello,
>
> I am trying to use DIH with FileListEntityProcessor to to walk the
> disk and read XML documents. I have a dataConfig.xml as follows:-
>
>   
>
>  processor="FileListEntityProcessor"
>   fileName=".*xml"
>   newerThan="'NOW-1000DAYS'"
>   recursive="true"
>   rootEntity="false"
>   dataSource="null"
>   baseDir="/Volumes/spare/ts/j/groups">
>  processor="XPathEntityProcessor"
>   url="${jcurrent.fileAbsolutePath}"
>   stream="false"
>   forEach="/record"
>   transformer="DateFormatTransformer">0
>   
>xpath="/record/metadata/subje...@qualifier='fullTitle']"/>
>   
>xpath="/record/metadata/subje...@qualifier='publication']"/>
> xpath="/record/metadata/subje...@qualifier='pubAbbrev']"/>
>xpath="/record/metadata/da...@qualifier='pubDate']"/>
>
>   
>   
>   
>
>
> But when I try and start the walker I get:-
>
>   INFO: [jdocs] REMOVING ALL DOCUMENTS FROM INDEX
>   Jan 13, 2009 3:38:11 PM org.apache.solr.core.SolrDeletionPolicy onInit
>   INFO: SolrDeletionPolicy.onInit: commits:num=2
>
> commit{dir=/Volumes/spare/ts/solrnightlyj/data/index,segFN=segments_1,version=1231861070710,generation=1,filenames=[segments_1]
>
> commit{dir=/Volumes/spare/ts/solrnightlyj/data/index,segFN=segments_2,version=1231861070711,generation=2,filenames=[segments_2]
>   Jan 13, 2009 3:38:11 PM org.apache.solr.core.SolrDeletionPolicy
> updateCommits
>   INFO: last commit = 1231861070711
>   Jan 13, 2009 3:38:11 PM org.apache.solr.handler.dataimport.DocBuilder
> buildDocument
>   SEVERE: Exception while processing: jcurrent document : null
>   org.apache.solr.handler.dataimport.DataImportHandlerException: No
> dataSource :null available for entity :x Processing Document # 1
>   at
> org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:287)
>   at
> org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl.java:86)
>   at
> org.apache.solr.handler.dataimport.XPathEntityProcessor.init(XPathEntityProcessor.java:78)
>   at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:243)
>   at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:309)
>   at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179)
>   at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:137)
>   at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:337)
>   at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:397)
>   at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:378)
>   Jan 13, 2009 3:38:11 PM org.apache.solr.handler.dataimport.DataImporter
> doFullImport
>   SEVERE: Full Import failed
>   org.apache.solr.handler.dataimport.DataImportHandlerException: No
> dataSource :null available for entity :x Processing Document # 1
>   at
> org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:287)
>   at
> org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl.java:86)
>   at
> org.apache.solr.handler.dataimport.XPathEntityProcessor.init(XPathEntityProcessor.java:78)
>   at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:243)
>   at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:309)
>   at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179)
>   at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:137)
>   at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:337)
>   at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:397)
>   at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:378)
>
> Anybody able to point out what I have done wrong?
>
> Regards Fergus.
> --
> ===
> Fergus McMenemie   
> Email:fer...@twig.me.uk
> Techmore Ltd   Phone:(UK) 07721 376021
> Unix/Mac/Intranets Analyst Programmer
> ===
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: getting DIH to read my XML files

2009-01-13 Thread Fergus McMenemie
Shalin, thanks for the speedy response.

>Which version of Solr are you using?
Solr Implementation Version: nightly exported - yonik - 2008-11-13 08:05:48

>
>I think there should be a dataSource="null" in the child entity as well.
OK that had an effect; I now get:-

   Jan 13, 2009 4:42:28 PM org.apache.solr.core.SolrDeletionPolicy updateCommits
   INFO: last commit = 1231864933487
   Jan 13, 2009 4:42:28 PM org.apache.solr.handler.dataimport.DocBuilder 
buildDocument
   SEVERE: Exception while processing: janescurrent document : null
   org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing 
failed for xml, 
url:/Volumes/spare/ts/janes/dtd/janesxml/data/news/jtic/groups/jwit0009.xmlrows 
processed :0 Processing Document # 1
   at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
   at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:252)
   at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:177)
   at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160)
   at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:283)
   at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:309)
   at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179)
   at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:137)
   at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:337)
   at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:397)
   at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:378)
   Caused by: java.lang.RuntimeException: java.lang.NullPointerException
   at 
org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:85)
   at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:242)
   ... 9 more
   Caused by: java.lang.NullPointerException
   at 
com.ctc.wstx.io.ReaderBootstrapper.initialLoad(ReaderBootstrapper.java:245)
   at 
com.ctc.wstx.io.ReaderBootstrapper.bootstrapInput(ReaderBootstrapper.java:132)
   at 
com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:543)
   at 
com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:604)
   at 
com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:660)
   at 
com.ctc.wstx.stax.WstxInputFactory.createXMLStreamReader(WstxInputFactory.java:331)
   at 
org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:81)
   ... 10 more
   Jan 13, 2009 4:42:28 PM org.apache.solr.handler.dataimport.DataImporter 
doFullImport
   SEVERE: Full Import failed
   org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing 
failed for xml, 
url:/Volumes/spare/ts/janes/dtd/janesxml/data/news/jtic/groups/jwit0009.xmlrows 
processed :0 Processing Document # 1
   at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
   at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:252)
   at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:177)
   at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160)
   at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:283)
   at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:309)
   at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179)
   at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:137)
   at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:337)
   at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:397)
   at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:378)
   Caused by: java.lang.RuntimeException: java.lang.NullPointerException
   at 
org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:85)
   at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:242)
   ... 9 more
   Caused by: java.lang.NullPointerException
   at 
com.ctc.wstx.io.ReaderBootstrapper.initialLoad(ReaderBootstrapper.java:245)
   at 
com.ctc.wstx.io.ReaderBootstrapper.bootstrapInput(ReaderBootstrapper.java:132)
   at 
com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:543)
   

Re: What do we mean by Searcher?

2009-01-13 Thread Otis Gospodnetic
Manu,

If you truly want to get a better feeling for the notion of a Searcher, my 
advice is to play with Lucene a little bit first.  Do you have a copy of Lucene 
in Action?  You get get a cheaper version online on manning.com/hatcher2 if you 
want and quickly read a bit about Searcher in one of the early chapters.  In 
short, the searcher is the object/the thing that performs searches against an 
index.

More answers to your questions below.


> We use Data Import feature of Solr to index database tables. Now, I send a
> query(*:*) through Solr Admin console for searching. And I get back search
> result. In this whole process, I have following questions - 
> 1. What is the significance of Searcher in this case?

The searcher is the thing that performed the search.  It took your query 
string, opened an index, ran the search, and got results.

> 2. When is Searcher invoked?

When you run a search request.

> 3. Who invokes Searher?

You do, when you call one of the SearchComponents or RequestHandlers, when you 
run a search request.

> 4. Where it is Stored?

Searcher is not really "stored".  It's a piece of code that runs inside Solr, 
which runs inside a servlet container, which runs inside a JVM, and so on.

> 5. When I send another query (manu:abc), will a new Searcher created?

No, the same searcher will be used unless you told Solr to open a new Searcher.

> 6. How is searcher auto-warmed in this case?

http://wiki.apache.org/solr/?action=fullsearch&context=180&value=autowarm&fullsearch=Text

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Manupriya 
> To: solr-user@lucene.apache.org
> Sent: Tuesday, January 13, 2009 9:25:02 AM
> Subject: What do we mean by Searcher?
> 
> 
> Hi,
> 
> I am somehow new to Solr. While reading through documents/resources, I have
> come across 'Searcher' term many times. I am able to roughly undestand, that
> whenever we fire any query, we are actually invoking a searcher. This
> searcher searches through the index and returns results.
> 
> But I am not able to fully grasp its meaning. I refered a previous post as
> well - http://www.nabble.com/what-is-searcher-td15448682.html#a15448682.
> 
> I have also read through -
> http://lucene.apache.org/java/2_3_0/api/org/apache/lucene/search/Searcher.html#Searcher()
>  
> 
> But I am not able fully appreciate it.
> 
> I want to understand Searcher in a practical scenario - 
> 
> We use Data Import feature of Solr to index database tables. Now, I send a
> query(*:*) through Solr Admin console for searching. And I get back search
> result. In this whole process, I have following questions - 
> 1. What is the significance of Searcher in this case?
> 2. When is Searcher invoked?
> 3. Who invokes Searher?
> 4. Where it is Stored?
> 5. When I send another query (manu:abc), will a new Searcher created?
> 6. How is searcher auto-warmed in this case?
> 
> Can anyone please direct me to some tutorial/resource for this?
> 
> Thanks,
> Manu
> -- 
> View this message in context: 
> http://www.nabble.com/What-do-we-mean-by-Searcher--tp21436737p21436737.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to achieve combination of features grouping, scoring...

2009-01-13 Thread Otis Gospodnetic
Hi,

I don't think you can do any of that with Solr as it exists today.  My feeling 
is that you might want to model this new functionality/code after what's in 
SOLR-236, even though it's not the same thing as yours, or after the carrot2 
plugin.  I also have a feeling others might like this functionality, too, so if 
you can generalize and contribute, please consider doing that.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Norbert Hartl 
> To: SOLR mailing list 
> Sent: Tuesday, January 13, 2009 3:19:33 AM
> Subject: How to achieve combination of features grouping, scoring...
> 
> Hi,
> 
> I spent some time on solr in order to figure out what
> it can do. I sill have some problems finding the right
> way to do my search.
> 
> I have a bunch of heterogenous objects that I want to
> search. All of these objects belong to an owner. When
> a search is issued I like not only to find the individual
> objects but the grouped by their owner.
> 
> For grouping I didn't find much valuable other than to
> do this with a response writer. I tried collapsing but
> this is not what I mean. And facets are still something
> different. The only thing is the XSLTResponseWriter that
> does grouping of stuff afterwards. 
> 
> What is the best way to achieve this:
> 
> - how to group stuff when there are many results to take
>   into account
> - how to score based on grouped objects. To group with
>   the response writer is not hard. But if I want to do
>   pagination I like to have the top scored group at the
>   top of the results. Is there a way to do so?
> - I like to only show the fields that match a query. As 
>   someone hinted here on the ML doing this with highlighting
>   is the only way I found. But then I don't understand that
>   I can provide a field list (hl.fl) but this does not take
>   a * for every field like some of the other parameters do.
> 
> Thanks in advance,
> 
> Norbert



Re: prefetching question

2009-01-13 Thread Chris Harris
Maybe it's just me, but I'm not sure what you mean by "prefetching".
(I don't even know if you're talking about an indexing-time activity
or a query-time activity.) My guess is that you'll get a more helpful
reply if you can make your question more specific.

Cheers,
Chris

On Tue, Jan 13, 2009 at 6:51 AM, Jae Joo  wrote:
> Hi,
>
> We do have 16 millions of company name and would like to find the way for
> "prefetching" by using Solr.
>
> Does anyone have experience and/or suggestions?
>
> Thanks,
>
> Jae Joo
>


RE: Commiting index while time-consuming query is running

2009-01-13 Thread Feak, Todd
I believe that when you commit, a new IndexReader is created, which is
warmed, etc. New incoming queries will be sent to this new IndexReader.
Once all previously existing queries have been answered, the old
IndexReader will shut down.

The commit doesn't wait for the query to finish, but it shouldn't impact
the results of that query either. What may be impacted is overall system
performance while you have 2 IndexReaders in play. There will always be
some amount of overlap, but it may be drawn out by the long query.

-Todd Feak

-Original Message-
From: wojtekpia [mailto:wojte...@hotmail.com] 
Sent: Tuesday, January 13, 2009 2:18 PM
To: solr-user@lucene.apache.org
Subject: Commiting index while time-consuming query is running


Once in a while my Solr instance receives a query that takes a really
long
time to execute (several minutes or more). What will happen if I update
my
index (and commit) while one of these really long queries is executing?
Will
Solr wait for the query to complete before it commits my update?

(on a side note, I'm re-working my UI to eliminate these queries)

Thanks!
-- 
View this message in context:
http://www.nabble.com/Commiting-index-while-time-consuming-query-is-runn
ing-tp21445704p21445704.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: non fix highlight snippet size

2009-01-13 Thread Mike Klaas

On 13-Jan-09, at 12:48 AM, Marc Sturlese wrote:



Hey there,
I need a rule in my highlights that sets for example, the snippet  
size to
400 in case there's just one snippet, 225 in case two snippeds are  
found and
125 in case 3 or more snippets are found. Is there any way to do  
that via
solrconfig.xml (for what I have seen don't think so...) or should I  
code a
plugin? In the second case do a I need an extended class of  
GapFragmenter or

what I should hack is in another pice of the source?
Thanks in advanced


There is no easy way to accomplish that, due to the architecture of  
the highlighter (which first generates fragments and only then  
determine whether they are snippets that contain the keyword(s))


-Mike


Indexing the same data in many records

2009-01-13 Thread philmccarthy

Hi,

I'd like to use Solr to index some webserver logs, in order to allow easy
ad-hoc querying and analysis. Each Solr Document will represent a single
request to the webserver, with fields for time, request URL, referring URL
etc.

I'm also planning to fetch the page source of each referring URL, and add
that as an indexed field in the Solr document. The aim is to allow queries
like "find hits to /xyz.html where the referring page contains the word
'foobar'".

Since hundreds or even thousands of hits may all come from the same
referring page, would this approach be horribly inefficient? (Note the page
source won't be stored in each Document, just indexed). Am I going to
dramatically increase the index size if I do this?

If so, is there a more elegant way to do what I want?

Many thanks,
Phil



-- 
View this message in context: 
http://www.nabble.com/Indexing-the-same-data-in-many-records-tp21448465p21448465.html
Sent from the Solr - User mailing list archive at Nabble.com.



faceted search returning multiple values for same field

2009-01-13 Thread Deo, Shantanu
Hi,
  I am using solr for indexing some product data, and wanted to use the
faceted search. My indexed field (mfg) sometimes contains two words
"sony erricson" for example. When I get the facets on the mfg, SOLR
return "sony" and "erricson" as separate hits. There are also some
facets that show up rather mysteriously.

My Unique list of mfg's that is indexed is as follows:
AT&T
BlackBerry?
HTC
LG
Motorola
Nokia
Option
Palm
Pantech
Samsung
Sierra Wireless
Sony Ericsson


The resulting facets being returned is below:
"facet_fields":{
"mfg":[
 "ericsson",195,
 "soni",156,
 "samsung",155,
 "nokia",90,
 "Ericsson",78,
 "Sony",78,
 "Samsung",62,
 "motorola",55,
 "lg",50,
 "sony",39,
 "Nokia",36,
 "pantech",25,
 "Motorola",22,
 "LG",20,
 "berri",16,
 "black",16,
 "blackberri",16,
 "Pantech",10,
 "BlackBerry",8,
 "blackberry",4,
 "AT",0,
 "HTC",0,
 "Option",0,
 "Palm",0,
 "Sierra",0,
 "T",0,
 "Wireless",0,
 "at",0,
 "att",0,
 "htc",0,
 "option",0,
 "palm",0,
 "sierra",0,
 "t",0,
 "wireless",0]


I have tried playing around with defining the fieldtype using the
following analyzers:

  




  

 

Any ideas if its possible to get the same facets as are in the data
that's being indexed or would I have to write my own Filter for this
purpose ?

Thanks
Shantanu Deo
AT&T eCommerce Web Hosting - Release Management
Office: (425)288-6081
email: sd1...@att.com


Re: faceted search returning multiple values for same field

2009-01-13 Thread Shalin Shekhar Mangar
On Wed, Jan 14, 2009 at 8:45 AM, Deo, Shantanu  wrote:

>
> I have tried playing around with defining the fieldtype using the
> following analyzers:
>  positionIncrementGap="100" >
>  
>
>
>
> words="manufacturer.txt"/>
>  
> 
>
>
> Any ideas if its possible to get the same facets as are in the data
> that's being indexed or would I have to write my own Filter for this
> purpose ?


Faceting works on the indexed terms. Therefore, you should make sure what
you index is exactly as what you stored. You probably need to facet on a
"string" type.


>
>
> Thanks
> Shantanu Deo
> AT&T eCommerce Web Hosting - Release Management
> Office: (425)288-6081
> email: sd1...@att.com
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: faceted search returning multiple values for same field

2009-01-13 Thread Otis Gospodnetic
Shantanu,

It sounds like all you have to do is switch to a field type that doesn't 
tokenize your mfg field.  Try field type "string".  You'll need to reindex once 
you make this change.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: "Deo, Shantanu" 
> To: solr-user@lucene.apache.org
> Sent: Tuesday, January 13, 2009 10:15:09 PM
> Subject: faceted search returning multiple values for same field
> 
> Hi,
>   I am using solr for indexing some product data, and wanted to use the
> faceted search. My indexed field (mfg) sometimes contains two words
> "sony erricson" for example. When I get the facets on the mfg, SOLR
> return "sony" and "erricson" as separate hits. There are also some
> facets that show up rather mysteriously.
> 
> My Unique list of mfg's that is indexed is as follows:
> AT&T
> BlackBerry?
> HTC
> LG
> Motorola
> Nokia
> Option
> Palm
> Pantech
> Samsung
> Sierra Wireless
> Sony Ericsson
> 
> 
> The resulting facets being returned is below:
> "facet_fields":{
> "mfg":[
>  "ericsson",195,
>  "soni",156,
>  "samsung",155,
>  "nokia",90,
>  "Ericsson",78,
>  "Sony",78,
>  "Samsung",62,
>  "motorola",55,
>  "lg",50,
>  "sony",39,
>  "Nokia",36,
>  "pantech",25,
>  "Motorola",22,
>  "LG",20,
>  "berri",16,
>  "black",16,
>  "blackberri",16,
>  "Pantech",10,
>  "BlackBerry",8,
>  "blackberry",4,
>  "AT",0,
>  "HTC",0,
>  "Option",0,
>  "Palm",0,
>  "Sierra",0,
>  "T",0,
>  "Wireless",0,
>  "at",0,
>  "att",0,
>  "htc",0,
>  "option",0,
>  "palm",0,
>  "sierra",0,
>  "t",0,
>  "wireless",0]
> 
> 
> I have tried playing around with defining the fieldtype using the
> following analyzers:
> 
> positionIncrementGap="100" >
>   
> 
> 
> 
> 
> words="manufacturer.txt"/>
>   
> 
> 
> 
> Any ideas if its possible to get the same facets as are in the data
> that's being indexed or would I have to write my own Filter for this
> purpose ?
> 
> Thanks
> Shantanu Deo
> AT&T eCommerce Web Hosting - Release Management
> Office: (425)288-6081
> email: sd1...@att.com



Re: Indexing the same data in many records

2009-01-13 Thread Otis Gospodnetic
Phil,

>From what you described so far, I don't see any red flags.  I would pay 
>attention to reading those timestamps (covered on the Wiki and ML archives), 
>that's all.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: philmccarthy 
> To: solr-user@lucene.apache.org
> Sent: Tuesday, January 13, 2009 8:49:33 PM
> Subject: Indexing the same data in many records
> 
> 
> Hi,
> 
> I'd like to use Solr to index some webserver logs, in order to allow easy
> ad-hoc querying and analysis. Each Solr Document will represent a single
> request to the webserver, with fields for time, request URL, referring URL
> etc.
> 
> I'm also planning to fetch the page source of each referring URL, and add
> that as an indexed field in the Solr document. The aim is to allow queries
> like "find hits to /xyz.html where the referring page contains the word
> 'foobar'".
> 
> Since hundreds or even thousands of hits may all come from the same
> referring page, would this approach be horribly inefficient? (Note the page
> source won't be stored in each Document, just indexed). Am I going to
> dramatically increase the index size if I do this?
> 
> If so, is there a more elegant way to do what I want?
> 
> Many thanks,
> Phil
> 
> 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/Indexing-the-same-data-in-many-records-tp21448465p21448465.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Custom Transformer to handle Timestamp

2009-01-13 Thread con

Thanks Shalin

That really helped :handshake:
I have created a plugin class and now things are working fine
Thanks Again

Regards 
Con



Shalin Shekhar Mangar wrote:
> 
> On Tue, Jan 13, 2009 at 12:53 AM, con  wrote:
> 
>>
>> Hi all
>>
>> I am using solr to index data from my database.
>> In my database there is a timestamp field of which data will be in the
>> form
>> of,
>> 15-09-08 06:28:38.44200 AM. The column is of type TIMESTAMP in the
>> oracle db.
>> So in the schema.xml  i have mentioned as:
>>   > />
>>
>> While indexing data in the debug mode i get this timestamp  value as
>> 
>>oracle.sql.TIMESTAMP:oracle.sql.timest...@f536e8
>> 
>>
>> And when i do a searching this value is not displaying while all other
>> fields indexed along with it are getting displayed.
> 
> 
> Hmm, interesting. It seems oracle.sql.TIMESTAMP does not inherit from
> java.sql.Timestamp or java.util.Date. This is why DataImportHandler/Solr
> cannot make sense out of it and the string representation is being stored
> in
> the index.
> 
> However it has a toJdbc() method which will return a Jdbc compatible
> object.
> 
> http://download-uk.oracle.com/otn_hosted_doc/jdeveloper/904preview/jdbc-javadoc/oracle/sql/TIMESTAMP.html#toJdbc()
> 
> 
>> 1) So do i need to write a custom transformer to add these values to the
>> index.
> 
> 
> Yes, it seems like that is the only way.
> 
> 
>> 2)And if yes I am confused how it is? Is there a sample code somewhere?
> 
> 
> Yes, see an example here --
> http://wiki.apache.org/solr/DIHCustomTransformer
> 
> 
>>
>> I have tried the sample TrimTransformer and it is working. But can i
>> convert
>> this string to a valid date format.(I am not a java expert..:-( )?
> 
> 
> I would start by trying something like this:
> 
> oracle.jdbc.TIMESTAMP timestamp = (oracle.jdbc.TIMESTAMP)
> row.get("your_timestamp_field_name");
> row.put("your_timestamp_field_name", timestamp.toJdbc());
> 
> 
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Custom-Transformer-to-handle-Timestamp-tp21421742p21421742.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Custom-Transformer-to-handle-Timestamp-tp21421742p21450404.html
Sent from the Solr - User mailing list archive at Nabble.com.



CommonsHttpSolrServer in multithreaded env

2009-01-13 Thread Gargate, Siddharth
Hi all,
Is it safe to use a single instance of CommonsHttpSolrServer object
in multithreaded environment? I am having multiple threads that are
accessing single CommonsHttpSolrServer static object but sometimes the
application gets blocked. Following is the stacktrace printed for all
threads
 
  "indexthread1" Id=47 prio=5 RUNNABLE (in native)
   Blocked (cnt): 5853; Waited (cnt): 30
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:129)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
 - locked java.io.bufferedinputstr...@147d387
   at
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
   at
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
   at
org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.jav
a:1116)
   at
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpCon
nectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
   at
org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBa
se.java:1973)
   at
org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase
.java:1735)
   at
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java
:1098)
   at
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMe
thodDirector.java:398)
   at
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMetho
dDirector.java:171)
   at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:3
97)
   at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:3
23)
   at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsH
ttpSolrServer.java:335)
   at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsH
ttpSolrServer.java:183)
   at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest
.java:217)
   at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:85)
   at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:74)
   at wt.index.SolrIndexDelegate.index(SolrIndexDelegate.java:84)


 


Searchable and Non Searchable Fields

2009-01-13 Thread con

Hi All

I am using dataimporthandler to index values from oracle db.

My sample rows are like:

1) FirstName-> George,LastName-> Bush,  Country-> US
2) FirstName-> Georgeon, LastName-> Washington, Country-> US
3) FirstName-> Tony,   LastName-> George,   Country-> UK
4) FirstName-> Gordon,LastName-> Brown,Country-> UK
5) FirstName-> Vladimer,  LastName-> Putin,  Country-> Russia

How can i set only the FirstName field as searchable.
For eg. if I search George, I should get FirstName, LastName and Country of
first and second rows only, and if I search Bush no value should be
returned.

I tried by providing various options for the  at schema.xml



   
But it is not providing the exact results. 

How can I change the field attributes to get this result? Or is there
someother configs for this?

Expecting reply
Thanks in advance
con
-- 
View this message in context: 
http://www.nabble.com/Searchable-and-Non-Searchable-Fields-tp21450664p21450664.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Issue in Facet on date field

2009-01-13 Thread Chris Hostetter

: I have to create two facets on a date field:
: 1) First Facet will have results between two date range , i.e. [NOW TO
: NOW+45DAYS]
: 2) Second Facet will have results between two date range , i.e. [NOW-45DAYS
: TO NOW]

the date faceting code is designed to generate counts for regular 
intervals of times (specified by "gap") between a fixed start and end.  
you could probably get what you want with something like...

  facet.date.start = NOW-45DAYS
  facet.date.end = NOW+45DAYS
  facet.date.gap = +45DAYS

...but to be perfectly honest, if you know you want exactly two counts, 
one for hte last 45 days and one for the next 45 days, then date faceting 
is overkill (and overly complicated) for your use case ... just use facet 
queries...

  facet.query=productPublicationDate_product_dt:[NOW-45DAYS TO NOW]
  facet.query=productPublicationDate_product_dt:[NOW TO NOW+45DAYS]

BTW: you'll probably want to replace "NOW" with "NOW/DAY" or "NOW/HOUR" to 
round down and get better cache utilization.


-Hoss