Re: catchall field minus one field

2012-01-12 Thread elisabeth benoit
thanks a lot for your advice, I'll try that.

Best regards,
Elisabeth

2012/1/11 Erick Erickson 

> Hmmm, Once the data is included in the catch-all, it's indistinguishable
> from
> all the rest of the data, so I don't see how you could do this. A clause
> like:
> -excludeField:[* TO *] would exclude all documents that had any data in
> the field so that's probably not what you want.
>
> Could you approach it the other way? Do NOT put the special field in
> the catch-all field in the first place, but massage the input to add
> a clause there? I.e. your "usual" case would have
> catchall: exclude_field:, but your
> special one would just be catchall:.
>
> You could set up request handlers to do this under the covers, so your
> queries would really be
> ...solr/usual?q=
> ...solr/special?q=
> and two different request handlers (edismax-style I'm thinking)
> would differ only by the "qf" field containing or not containing
> your special field.
>
> the other way, of course, would be to have a second catch-all
> field that didn't have your special field, then use one or the other
> depending, but as you say that would increase the size of your
> index...
>
> Best
> Erick
>
> On Wed, Jan 11, 2012 at 9:47 AM, elisabeth benoit
>  wrote:
> > Hello,
> >
> > I have a catchall field, and I need to do some request in all fields of
> > that catchall field, minus one. To avoid duplicating my index, I'd like
> to
> > know if there is a way to use my catch field while excluding that one
> field.
> >
> > Thanks,
> > Elisabeth
>


Restricting access to shards / collections with SolrCloud

2012-01-12 Thread Jaran Nilsen
Hi.

We're currently looking at SolrCloud to improve management of our Solr
cluster. There is one use case which I am wondering if SolrCloud provide
any support for out of the box, or if our best bet is to stick with our
current solution.

The use case is:

We have a large number of shards, using the same schema - so, perfect for
SolrCloud. Some of these shards should have restricted access, meaning only
customers with certain privileges will be able to query them. The way we
solve this today is to maintain a database listing those users who have
access to these restricted shards. When building the shards-parameter for
querying Solr, we then use this database to append the URLs of the
restricted shards ONLY if the user has access to them.

With SolrCloud it would be great to be able to use the distrib=true
parameter, but that would override the approach we're currently using.

My questions are:

1. would it be an idea to create a separate collection for the shards that
are restricted? If so, is there currently any support for specifying which
collections to search so that we could implement the solution outlined
above, but for collections rather than shards?

2. If no-go on #1, are we better off sticking with our current approach and
skip using distrib=true which would query all shards?

Any input appreciated!

Best,
Jaran

-- 
Jaran Nilsen
Skype: jaran.nilsen
jarannilsen.com || codemunchies.com || notpod.com
twitter.com/jarannilsen // www.linkedin.com/in/jarannilsen //
facebook.com/jaran.nilsen


Re: Large data set or data corpus

2012-01-12 Thread jmuguruza
http://www.data.gov/ has lots of datasets available for free

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Large-data-set-or-data-corpus-tp3650316p3653154.html
Sent from the Solr - User mailing list archive at Nabble.com.


Not able to see output in XML output

2012-01-12 Thread rajalapati
Hi,

In my SOLR, I have a query based data-config written and was able to manage
below steps but i was not able to see the output



1) Register Data Import Request handler in Solr-config.xml
2) Modify Data-Config.xml for the appropriate query to get data imported
from which includes making use of Jtds Driver for Sql server
3) Modify SolrConfig.xml file for registering db-data-config.xml in Request
Handler item
4) Modify schema.xml for the output result. Right now we are facing issues
here.please let me attach 2 files 1) schema.xml 2) db-data-config.xml.

Schema.xml


  
  
  
  
  
  
  
  
  
  
  
  
  

  
  
  
  
  
  

  
  
  
  
  
 
  
  
   FileId
   FileId
   
  


db-data-config.xml





   


   

   
   
   







5) Make full import http request for data to get indexed into solr server.
Even though i see that all the rows are indexed but not able to find results
when search is clicked on the admin page

6) Am i missing any step to configure the output,I have changed
db-data-config,schema.xml and solrconfig.xml files ,Do i need to change any
other files for the output


Thanks
Raj Deep 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Not-able-to-see-output-in-XML-output-tp3653445p3653445.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Relevancy and random sorting

2012-01-12 Thread Alexandre Rocco
Erick,

This document already has a field that indicates the source (site).
The issue we are trying to solve is when we list all documents without any
specific criteria. Since we bring the most recent ones and the ones that
contains images, we end up having a lot of listings from a single site,
since the documents are indexed in batches from the same site. At some
point we have several documents from the same site in the same date/time
and having images. I'm trying to give some random aspect to this search so
other documents can also appear in between that big dataset from the same
source.
Does the grouping help to achieve this?

Alexandre

On Thu, Jan 12, 2012 at 12:31 AM, Erick Erickson wrote:

> Alexandre:
>
> Have you thought about grouping? If you can analyze the incoming
> documents and include a field such that "similar" documents map
> to the same value, than group on that value you'll get output that
> isn't dominated by repeated copies of the "similar" documents. It
> depends, though, on being able to do a suitable mapping.
>
> In your case, could the mapping just be the site from which you
> got the data?
>
> Best
> Erick
>
> On Wed, Jan 11, 2012 at 1:58 PM, Alexandre Rocco 
> wrote:
> > Erick,
> >
> > Probably I really written something silly. You are right on either
> sorting
> > by field or ranking.
> > I just need to change the ranking to shift things around as you said.
> >
> > To clarify the use case:
> > We have a listing aggregator that gets product listings from a lot of
> > different sites and since they are added in batches, sometimes you see a
> > lot of pages from the same source (site). We are working on some changes
> to
> > shift things around and reduce this "blocking" effect, so we can present
> > mixed sources on the result pages.
> >
> > I guess I will start with the document random field and later try to
> > develop a custom plugin to make things better.
> >
> > Thanks for the pointers.
> >
> > Regards,
> > Alexandre
> >
> > On Wed, Jan 11, 2012 at 1:58 PM, Erick Erickson  >wrote:
> >
> >> I really don't understand what this means:
> >> "random sorting for the records but also preserving the ranking"
> >>
> >> Either you're sorting on rank or you're not. If you mean you're
> >> trying to shift things around just a little bit, *mostly* respecting
> >> relevance then I guess you can do what you're thinking.
> >>
> >> You could create your own function query to do the boosting, see:
> >> http://wiki.apache.org/solr/SolrPlugins#ValueSourceParser
> >>
> >> which would keep you from having to re-index your data to get
> >> a different "randomness".
> >>
> >> You could also consider external file fields, but I think your
> >> own function query would be cleaner. I don't think math.random
> >> is a supported function OOB
> >>
> >> Best
> >> Erick
> >>
> >>
> >> On Wed, Jan 11, 2012 at 8:29 AM, Alexandre Rocco 
> >> wrote:
> >> > Hello all,
> >> >
> >> > Recently i've been trying to tweak some aspects of relevancy in one
> >> listing
> >> > project.
> >> > I need to give a higher score to newer documents and also boost the
> >> > document based on a boolean field that indicates the listing has
> >> pictures.
> >> > On top of that, in some situations we need a random sorting for the
> >> records
> >> > but also preserving the ranking.
> >> >
> >> > I tried to combine some techniques described in the Solr Relevancy FAQ
> >> > wiki, but when I add the random sorting, the ranking gets messy (as
> >> > expected).
> >> >
> >> > This works well:
> >> >
> >>
> http://localhost:18979/solr/select/?start=0&rows=15&q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22&fl=*,score
> >> >
> >> > This does not work, gives a random order on what is already ranked
> >> >
> >>
> http://localhost:18979/solr/select/?start=0&rows=15&q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22&fl=*,score&sort=random_1+desc
> >> >
> >> > The only way I see is to create another field on the schema
> containing a
> >> > random value and use it to boost the document the same way that was
> tone
> >> on
> >> > the boolean field.
> >> > Anyone tried something like this before and knows some way to get it
> >> > working?
> >> >
> >> > Thanks,
> >> > Alexandre
> >>
>


Re: Relevancy and random sorting

2012-01-12 Thread Michael Kuhlmann

Does the random sort function help you here?

http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html

However, you will get some very old listings then, if it's okay for you.

-Kuli

Am 12.01.2012 14:38, schrieb Alexandre Rocco:

Erick,

This document already has a field that indicates the source (site).
The issue we are trying to solve is when we list all documents without any
specific criteria. Since we bring the most recent ones and the ones that
contains images, we end up having a lot of listings from a single site,
since the documents are indexed in batches from the same site. At some
point we have several documents from the same site in the same date/time
and having images. I'm trying to give some random aspect to this search so
other documents can also appear in between that big dataset from the same
source.
Does the grouping help to achieve this?

Alexandre

On Thu, Jan 12, 2012 at 12:31 AM, Erick Ericksonwrote:


Alexandre:

Have you thought about grouping? If you can analyze the incoming
documents and include a field such that "similar" documents map
to the same value, than group on that value you'll get output that
isn't dominated by repeated copies of the "similar" documents. It
depends, though, on being able to do a suitable mapping.

In your case, could the mapping just be the site from which you
got the data?

Best
Erick

On Wed, Jan 11, 2012 at 1:58 PM, Alexandre Rocco
wrote:

Erick,

Probably I really written something silly. You are right on either

sorting

by field or ranking.
I just need to change the ranking to shift things around as you said.

To clarify the use case:
We have a listing aggregator that gets product listings from a lot of
different sites and since they are added in batches, sometimes you see a
lot of pages from the same source (site). We are working on some changes

to

shift things around and reduce this "blocking" effect, so we can present
mixed sources on the result pages.

I guess I will start with the document random field and later try to
develop a custom plugin to make things better.

Thanks for the pointers.

Regards,
Alexandre

On Wed, Jan 11, 2012 at 1:58 PM, Erick Erickson
I really don't understand what this means:
"random sorting for the records but also preserving the ranking"

Either you're sorting on rank or you're not. If you mean you're
trying to shift things around just a little bit, *mostly* respecting
relevance then I guess you can do what you're thinking.

You could create your own function query to do the boosting, see:
http://wiki.apache.org/solr/SolrPlugins#ValueSourceParser

which would keep you from having to re-index your data to get
a different "randomness".

You could also consider external file fields, but I think your
own function query would be cleaner. I don't think math.random
is a supported function OOB

Best
Erick


On Wed, Jan 11, 2012 at 8:29 AM, Alexandre Rocco
wrote:

Hello all,

Recently i've been trying to tweak some aspects of relevancy in one

listing

project.
I need to give a higher score to newer documents and also boost the
document based on a boolean field that indicates the listing has

pictures.

On top of that, in some situations we need a random sorting for the

records

but also preserving the ranking.

I tried to combine some techniques described in the Solr Relevancy FAQ
wiki, but when I add the random sorting, the ranking gets messy (as
expected).

This works well:




http://localhost:18979/solr/select/?start=0&rows=15&q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22&fl=*,score


This does not work, gives a random order on what is already ranked




http://localhost:18979/solr/select/?start=0&rows=15&q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22&fl=*,score&sort=random_1+desc


The only way I see is to create another field on the schema

containing a

random value and use it to boost the document the same way that was

tone

on

the boolean field.
Anyone tried something like this before and knows some way to get it
working?

Thanks,
Alexandre










Re: Highlighting issue with PlainTextEntityProcessor.

2012-01-12 Thread meghana
Hi Erik.. Thanks for your reply.

And yes data was on index. but i found the problem , the problem was not of
PlainTextEntityProcessor. highlighting was returning in multivalued field
and in non-multivalued field there was less highlight. so i thought problem
may be in PlainTextEntityProcessor. 

But the actual problem was my Search field is very big... i increased
hl.MaxAnalyzedChar length... and it get working. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-issue-with-PlainTextEntityProcessor-tp3650004p3653708.html
Sent from the Solr - User mailing list archive at Nabble.com.


FacetComponent: suppress original query

2012-01-12 Thread Dmitry Kan
Hello list,

I need to split the incoming original facet query into a list of
sub-queries. The logic is done and each sub-query gets added into outgoing
queue with rb.addRequest(), where rb is instance of ResponseBuilder.
In the logs I see that along with the sub-queries the original query gets
submitted too. Is there a way of suppressing the original query?

-- 
Regards,

Dmitry Kan


Re: Relevancy and random sorting

2012-01-12 Thread Alexandre Rocco
Michael,

We are using the random sorting in combination with date and other fields
but I am trying to change this to affect the ranking instead of sorting
directly.
That way we can also use other useful tweaks on the rank itself.

Alexandre

On Thu, Jan 12, 2012 at 11:46 AM, Michael Kuhlmann  wrote:

> Does the random sort function help you here?
>
> http://lucene.apache.org/solr/**api/org/apache/solr/schema/**
> RandomSortField.html
>
> However, you will get some very old listings then, if it's okay for you.
>
> -Kuli
>
> Am 12.01.2012 14:38, schrieb Alexandre Rocco:
>
>  Erick,
>>
>> This document already has a field that indicates the source (site).
>> The issue we are trying to solve is when we list all documents without any
>> specific criteria. Since we bring the most recent ones and the ones that
>> contains images, we end up having a lot of listings from a single site,
>> since the documents are indexed in batches from the same site. At some
>> point we have several documents from the same site in the same date/time
>> and having images. I'm trying to give some random aspect to this search so
>> other documents can also appear in between that big dataset from the same
>> source.
>> Does the grouping help to achieve this?
>>
>> Alexandre
>>
>> On Thu, Jan 12, 2012 at 12:31 AM, Erick Erickson> com >wrote:
>>
>>  Alexandre:
>>>
>>> Have you thought about grouping? If you can analyze the incoming
>>> documents and include a field such that "similar" documents map
>>> to the same value, than group on that value you'll get output that
>>> isn't dominated by repeated copies of the "similar" documents. It
>>> depends, though, on being able to do a suitable mapping.
>>>
>>> In your case, could the mapping just be the site from which you
>>> got the data?
>>>
>>> Best
>>> Erick
>>>
>>> On Wed, Jan 11, 2012 at 1:58 PM, Alexandre Rocco
>>> wrote:
>>>
 Erick,

 Probably I really written something silly. You are right on either

>>> sorting
>>>
 by field or ranking.
 I just need to change the ranking to shift things around as you said.

 To clarify the use case:
 We have a listing aggregator that gets product listings from a lot of
 different sites and since they are added in batches, sometimes you see a
 lot of pages from the same source (site). We are working on some changes

>>> to
>>>
 shift things around and reduce this "blocking" effect, so we can present
 mixed sources on the result pages.

 I guess I will start with the document random field and later try to
 develop a custom plugin to make things better.

 Thanks for the pointers.

 Regards,
 Alexandre

 On Wed, Jan 11, 2012 at 1:58 PM, Erick Erickson>>> com 
 wrote:

  I really don't understand what this means:
> "random sorting for the records but also preserving the ranking"
>
> Either you're sorting on rank or you're not. If you mean you're
> trying to shift things around just a little bit, *mostly* respecting
> relevance then I guess you can do what you're thinking.
>
> You could create your own function query to do the boosting, see:
> http://wiki.apache.org/solr/**SolrPlugins#ValueSourceParser
>
> which would keep you from having to re-index your data to get
> a different "randomness".
>
> You could also consider external file fields, but I think your
> own function query would be cleaner. I don't think math.random
> is a supported function OOB
>
> Best
> Erick
>
>
> On Wed, Jan 11, 2012 at 8:29 AM, Alexandre Rocco
> wrote:
>
>> Hello all,
>>
>> Recently i've been trying to tweak some aspects of relevancy in one
>>
> listing
>
>> project.
>> I need to give a higher score to newer documents and also boost the
>> document based on a boolean field that indicates the listing has
>>
> pictures.
>
>> On top of that, in some situations we need a random sorting for the
>>
> records
>
>> but also preserving the ranking.
>>
>> I tried to combine some techniques described in the Solr Relevancy FAQ
>> wiki, but when I add the random sorting, the ranking gets messy (as
>> expected).
>>
>> This works well:
>>
>>
>  http://localhost:18979/solr/**select/?start=0&rows=15&q={!**
>>> boost%20b=recip(ms(NOW/HOUR,**date_updated),3.16e-11,1,1)}**
>>> active%3a%22true%22+AND+**featured%3a%22false%22+_val_:%**
>>> haspicture%22&fl=*,score
>>>

>> This does not work, gives a random order on what is already ranked
>>
>>
>  http://localhost:18979/solr/**select/?start=0&rows=15&q={!**
>>> boost%20b=recip(ms(NOW/HOUR,**date_updated),3.16e-11,1,1)}**
>>> active%3a%22true%22+AND+**featured%3a%22false%22+_val_:%**
>>> haspi

Re: Relevancy and random sorting

2012-01-12 Thread Ahmet Arslan
> This document already has a field that indicates the source
> (site).
> The issue we are trying to solve is when we list all
> documents without any
> specific criteria. Since we bring the most recent ones and
> the ones that
> contains images, we end up having a lot of listings from a
> single site,
> since the documents are indexed in batches from the same
> site. At some
> point we have several documents from the same site in the
> same date/time
> and having images. I'm trying to give some random aspect to
> this search so
> other documents can also appear in between that big dataset
> from the same
> source.
> Does the grouping help to achieve this?

Yes, http://wiki.apache.org/solr/FieldCollapsing
You will display only 3 documents at most from a single site. You will put a 
link saying that, there are xxx more documents from site yyy, click here to see 
all of them.


Re: Search Specific Boosting

2012-01-12 Thread Brett

Hi Erick,

Yeah, I've reviewed the debug output and can't make sense of why they 
are scoring the same.  I have double checked that they are being indexed 
with different boost values for the search field.  I've also increased 
the factors trying to get them be more granular so instead of boosting 
1,2,3,4,5 I did 100,200,300,400,500... Same result.


Here's and example of the debug output with two documents having 
different field boost values but receiving the same score.


Does anything stick out?  Any other ideas on how to get the results I am 
looking for?



69.694855 = (MATCH) product of: 104.54228 = (MATCH) sum of: 0.08869071 = 
(MATCH) MatchAllDocsQuery, product of: 0.08869071 = queryNorm 104.45359 
= (MATCH) weight(searchfe2684d248eab25404c3668711d4642e_boost:true in 
4016) [DefaultSimilarity], result of: 104.45359 = 
score(doc=4016,freq=1.0 = termFreq=1 ), product of: 0.48125002 = 
queryWeight, product of: 5.4261603 = idf(docFreq=81, maxDocs=6856) 
0.08869071 = queryNorm 217.04642 = fieldWeight in 4016, product of: 1.0 
= tf(freq=1.0), with freq of: 1.0 = termFreq=1 5.4261603 = 
idf(docFreq=81, maxDocs=6856) 40.0 = fieldNorm(doc=4016) 0.667 = 
coord(2/3)




69.694855 = (MATCH) product of: 104.54228 = (MATCH) sum of: 0.08869071 = 
(MATCH) MatchAllDocsQuery, product of: 0.08869071 = queryNorm 104.45359 
= (MATCH) weight(searchfe2684d248eab25404c3668711d4642e_boost:true in 
4106) [DefaultSimilarity], result of: 104.45359 = 
score(doc=4106,freq=1.0 = termFreq=1 ), product of: 0.48125002 = 
queryWeight, product of: 5.4261603 = idf(docFreq=81, maxDocs=6856) 
0.08869071 = queryNorm 217.04642 = fieldWeight in 4106, product of: 1.0 
= tf(freq=1.0), with freq of: 1.0 = termFreq=1 5.4261603 = 
idf(docFreq=81, maxDocs=6856) 40.0 = fieldNorm(doc=4106) 0.667 = 
coord(2/3)



On 1/11/2012 9:46 PM, Erick Erickson wrote:

Boosts are fairly coarse-grained. I suspect your boost factors are just
being rounded into the same buckets. Attaching&debugQuery=on and
looking at how the scores were calculated should help you figure out
if this is the case.

Best
Erick

On Wed, Jan 11, 2012 at 7:57 PM, Brett  wrote:

I'm implementing a feature where admins have the ability to control the
order of the results by adding a boost to any specific search.

The search is a faceted interface (no text input) and which we take a hash
of the search parameters (to form a unique search id) and then boost that
field for the document.

The field is a wild card field so that it might look like this:

true

The problem is that in these search results I am seeing is that my results
are being grouped and the individual boost values are not having the
granular effect I am looking for.

Say on a result set of 75 documents.  I see results with search boosts of
60-70 receiving the same score even though they were indexed with different
boost values.  There are always more than one group.

Does anyone know what might be causing this?  Is there a better way to do
what I am looking for?

Thanks,

Brett


Field Definition:


  


  
  


  




SolrException: Invalid Date String:'oracle.sql.TIMESTAMP

2012-01-12 Thread Joey Grimm
Hi,

I am trying to use a dataImportHandler to import data from an oracle DB.  It
works for non-date fields but is throwing an exception once I included the
MODIFIEDDATE field (oracle.timestamp field).  Can anyone see what I'm doing
wrong here?  Thanks.



schema.xml


db-data-config.xml











WARNING: Error creating document :
SolrInputDocument[{catModifiedDate=catModifiedDate(1.0)={oracle.sql.TIMESTAMP@1e58565},
masterId=masterId(1.0)={124}, catParentId=catParentId(1.0)={118},
catIconId=catIconId(1.0)={304856}}]
org.apache.solr.common.SolrException: ERROR: [doc=124] Error adding field
'catModifiedDate'='oracle.sql.TIMESTAMP@1e58565'
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:324)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
at 
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:73)
at
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:293)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:636)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
Caused by: org.apache.solr.common.SolrException: Invalid Date
String:'oracle.sql.TIMESTAMP@1e58565'
at org.apache.solr.schema.DateField.parseMath(DateField.java:165)
at org.apache.solr.schema.TrieField.createField(TrieField.java:421)
at 
org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:120)
at org.apache.solr.schema.SchemaField.createField(SchemaField.java:104)
at
org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203)
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:281)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrException-Invalid-Date-String-oracle-sql-TIMESTAMP-tp3654419p3654419.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: SolrException: Invalid Date String:'oracle.sql.TIMESTAMP

2012-01-12 Thread Colin Bennett
Hi,

It looks like a date formatting issue, the Solr date field expects something
like 1995-12-31T23:59:59.999Z

See http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html

The data import handler does have a date transformer to convert dates

http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer


Colin.



-Original Message-
From: Joey Grimm [mailto:jgr...@rim.com] 
Sent: Thursday, January 12, 2012 1:05 PM
To: solr-user@lucene.apache.org
Subject: SolrException: Invalid Date String:'oracle.sql.TIMESTAMP

Hi,

I am trying to use a dataImportHandler to import data from an oracle DB.  It
works for non-date fields but is throwing an exception once I included the
MODIFIEDDATE field (oracle.timestamp field).  Can anyone see what I'm doing
wrong here?  Thanks.



schema.xml
   


db-data-config.xml











WARNING: Error creating document :
SolrInputDocument[{catModifiedDate=catModifiedDate(1.0)={oracle.sql.TIMESTAM
P@1e58565},
masterId=masterId(1.0)={124}, catParentId=catParentId(1.0)={118},
catIconId=catIconId(1.0)={304856}}]
org.apache.solr.common.SolrException: ERROR: [doc=124] Error adding field
'catModifiedDate'='oracle.sql.TIMESTAMP@1e58565'
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:324)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProc
essorFactory.java:60)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProc
essorFactory.java:115)
at
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:73)
at
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHand
ler.java:293)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:
636)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268
)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.ja
va:359)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427
)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
Caused by: org.apache.solr.common.SolrException: Invalid Date
String:'oracle.sql.TIMESTAMP@1e58565'
at org.apache.solr.schema.DateField.parseMath(DateField.java:165)
at org.apache.solr.schema.TrieField.createField(TrieField.java:421)
at
org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:120)
at
org.apache.solr.schema.SchemaField.createField(SchemaField.java:104)
at
org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203)
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:281)

--
View this message in context:
http://lucene.472066.n3.nabble.com/SolrException-Invalid-Date-String-oracle-
sql-TIMESTAMP-tp3654419p3654419.html
Sent from the Solr - User mailing list archive at Nabble.com.





RE: Determining which shard is failing using partialResults / some other technique?

2012-01-12 Thread Gilles Comeau
Hi all,

 

Is there at least a way to print out which shard is being called in the
logging and maybe logging a failure? 

 

INFO: [master] webapp=/solr path=/select
params={facet=true&facet.mincount=1&facet.sort=count&q=(content_1500_chars:(
("allied+irish+banks"+OR+"+aib+")+AND+NOT+(bluray+OR+"RAR+"+OR+"mega+pack"))
+OR+title:(("allied+irish+banks"+OR+"+aib+")+AND+NOT+(bluray+OR+"RAR+"+OR+"m
ega+pack")))+&facet.limit=10&facet.shard.limit=300&distrib=true&facet.field=
organisation&wt=javabin&fq=harvest_time_long:[131077440+TO+132641279
]&rows=0&version=2} status=0 QTime=16192

 

Regards,

Gilles

 

From: Gilles Comeau [mailto:gilles.com...@polecat.co] 
Sent: 12 January 2012 07:02
To: 'solr-user@lucene.apache.org'
Subject: Determining which shard is failing using partialResults / some
other technique?

 

Hi Solr Users,

 

Does anyone happen to know if the keyword partialResults be used in a solr
http request?   (partialResults is turned off at the .xml level)

 

Something like: http://server:8080/solr/master/select?distrib=true

&rows=500&fl=*,score&start=0&partialResults=true&q=my+and+query&fq=harvest_t
ime_long:[132537600+TO+132537600]


We have a Solr instance that is periodically failing on distributed
requests, and I am trying to narrow down which one of the shards is causing
the failure.   If the above doesn't work, can someone point me to a resource
or give advice on how to find out which node might be causing the issue?

 

Regards,

 

Gilles



a way to marshall xml doc into a SolrInputDocument

2012-01-12 Thread jmuguruza
If I have individual files in the expected Solr format (having just ONE doc
per file):


  
GB18030TEST
Test with some GB18030 encoded characters
No accents here
ÕâÊÇÒ»¸ö¹¦ÄÜ
0
  


Is not there a way to easily marshal that file into a SolrInputDocument? Do
I have to do the parsing myself?
 
I need them in java pojo cause I want to modify some fields before indexing.
I would think that is possible with built in methods in Solr but cannot find
a way.

thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/a-way-to-marshall-xml-doc-into-a-SolrInputDocument-tp3654777p3654777.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: a way to marshall xml doc into a SolrInputDocument

2012-01-12 Thread Tomás Fernández Löbbe
Can those modifications be made on the server side? If so, you could create
an UpdateRequestProcessor. See
http://wiki.apache.org/solr/UpdateRequestProcessor

On Thu, Jan 12, 2012 at 5:19 PM, jmuguruza  wrote:

> If I have individual files in the expected Solr format (having just ONE doc
> per file):
>
> 
>  
>GB18030TEST
>Test with some GB18030 encoded characters
>No accents here
>ÕâÊÇÒ»¸ö¹¦ÄÜ
>0
>  
> 
>
> Is not there a way to easily marshal that file into a SolrInputDocument? Do
> I have to do the parsing myself?
>
> I need them in java pojo cause I want to modify some fields before
> indexing.
> I would think that is possible with built in methods in Solr but cannot
> find
> a way.
>
> thanks
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/a-way-to-marshall-xml-doc-into-a-SolrInputDocument-tp3654777p3654777.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: a way to marshall xml doc into a SolrInputDocument

2012-01-12 Thread jmuguruza
even if they could (not sure if they could be done there, as they involve
properly formatting some fields so dates are in correct format etc, and
maybe the format is checked first) I would prefer to do it in the solrj side
as the code will be much simpler for me.

thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/a-way-to-marshall-xml-doc-into-a-SolrInputDocument-tp3654777p3655033.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SpatialSearch, geofilt and documents missing a value in sfield

2012-01-12 Thread Smiley, David W.
Hi Tanguy,

On Jan 11, 2012, at 6:14 AM, Tanguy Moal wrote:

> Dear ML,
> 
> I'm performing some developments relying on spatial capabilities of solr.
> 
> I'm using Solr 3.5, have been reading 
> http://wiki.apache.org/solr/SpatialSearch#Spatial_Query_Parameters and have 
> the basic behaviours I wanted working.
> I use geofilt on a latlong field, with geodist() in the bf parameter.
> 
> When I doq=*:*&fq={!geofilt pt=x,y d=r unit=km 
> sfield=coordinates}&defType=edismax everything works fine.
> 
> But in some cases, documents don't have coordinates.
> For example, some of them refer to a city, so they have coordinates, while 
> others are not so precisely geolocated and simply refer to a broader area, a 
> region or a state, if you will.

You've seen this; right?
http://wiki.apache.org/solr/SpatialSearch#How_to_combine_with_a_sub-query_to_expand_results

> I tried with different queries :
> 
> - Include results from a broader area : q=*:*&fq=(state:FL OR 
> _query_:"{!geofilt ...}") .
> => That works fine (i.e. results showing up), but not as expected : this only 
> returns documents having FL as value in the state field AND some value in the 
> coordinates field *or* documents around my point but not documents without a 
> value in the coordinates field…

Your explanation of what happens is not consistent with with this query does.  
The filter query is OR, not AND.  The xml example docs that come with Solr 
don't all include a value in the "store" LatLonType field, so if what you claim 
is true, you should be able to prove it with a query against that data set we 
all have.  Please try to do so; I think you are mistaken.

> - Include results from a broader area, feeling lucky : 
> q=*:*&fq=((state:FL%20AND%20-coordinates:[*%20TO%20*])%20OR%20_query_:"{!geofilt%20pt=x,y%20d=r%20unit=km%20sfield=coordinates}")
>  
> => which does what is asked to... Return both the results with FL in the 
> state field and no value in the coordinates field *plus* results within a 
> radius around a point, *but* the problem is that in that case, the solr 
> search layer dies unconditionnally with the following stack :
>> Problem accessing /solr/geo_xpe/select. Reason:
>> 
>>null
>> 
>> java.lang.NullPointerException
>>at 
>> org.apache.lucene.spatial.DistanceUtils.parsePoint(DistanceUtils.java:351)
>>at org.apache.solr.schema.LatLonType.getRangeQuery(LatLonType.java:95)
>>at 
>> org.apache.solr.search.SolrQueryParser.getRangeQuery(SolrQueryParser.java:165)
...
> Of course, it doesn't make sense to expect the distance computation to work 
> with documents lacking value in the coordinate field!

Arguably this is a bug.  LatLonType doesn't handle open-ended range queries and 
it didn't check for a null argument defensively either.  This will happen 
wether there is indexed data or not.

[* TO *] queries are slow, particularly when there are many values -- like at 
least a thousand.  If you want to perform this type of query, instead index a 
boolean field corresponding to another field that indicates wether that field 
has a value.  This would be a good use of an UpdateRequestProcessor but you can 
just as well do it elsewhere.

> From a user perspective, having the possibility to define a default distance 
> to be returned for document missing a value in the coordinate field could be 
> helpful... If something like sortMissingFirst or sortMissingLast is specified 
> on the field.
> * sortMissingLast="true" could be obtained with a +Inf distance returned if 
> no value in the field
> * sortMissingFirst="true" could be obtained with a 0 distance returned if no 
> value in the field
> 
> I may be misunderstanding concepts, but those sorting attributes seem to only 
> apply for sorting and not to the documents selection process (geofilt)..? I 
> know that since solr3.5, it's possible to define sortMissing(Last|First) on 
> trie-based fields, but I don't know what happens for fields defined that way :
> ...
> 
>...
>  omitNorms="true" positionIncrementGap="0"/>
>  sortMissingLast="true" omitNorms="true" subFieldType="double" />
>...
> 
> ...
> 
>...
>  mutliValued="false"/>
>...
> 
> ...
> 
> Help is welcome!

Indeed, sortMissing,etc. are used in sorting, and play no part in wether a 
document matches or not.  And for LatLonType, they won't do anything.  
LatLonType uses the a pair of double fields under the hood, as seen in your 
schema excerpt.  You could put those attributes there but I don't think that 
would work.  I was playing around with blank values yesterday and I found that 
blank values result in a distance away from the query point that is very large… 
I forget what value it was but you can try yourself.

~ David Smiley



replication failure, logs or notice?

2012-01-12 Thread Jonathan Rochkind
I think maybe my Solr 1.4 replications have been failing for quite some 
time, without me realizing it, possibly due to lack of disk space to 
replicate some large segments.


Where would I look to see if a replication failed? Just the standard 
solr log?  What would I look for?


There's no facility to have, like an email sent if replication fails or 
anything, is there?


I realize that Solr/java logging is something that still confuses me, 
I've done whatever was easiest, but I'm vaguely remembering now that by 
picking the right logging framework and configuring it properly, maybe 
you can send different types of events to different logs, like maybe 
replication events to their own log? Is this a thing?


Thanks for any ideas,

Jonathan




can solr automatically search for different punctuation of a word

2012-01-12 Thread alxsss
Hello,

I would like to know if solr has a functionality to automatically search for a 
different punctuation of a word. 
For example if I if a user searches for a word Uber, and stemmer is german 
lang, then solr looks for both Uber and  Über,  like in synonyms.

Is it possible to give a file with a list of possible substitutions of letters 
to solr and have it search for all possible punctuations?


Thanks.
Alex.


Re: SolrException: Invalid Date String:'oracle.sql.TIMESTAMP

2012-01-12 Thread yunfei wu
I guess you probably run into the issue between different date value format
in your oracle db and in solr field. Solr only expects XML date value in
UTC format -
http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html.

You might need to consider DateFormatTransformer -
http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer

Yunfei


On Thu, Jan 12, 2012 at 10:05 AM, Joey Grimm  wrote:

> Hi,
>
> I am trying to use a dataImportHandler to import data from an oracle DB.
>  It
> works for non-date fields but is throwing an exception once I included the
> MODIFIEDDATE field (oracle.timestamp field).  Can anyone see what I'm doing
> wrong here?  Thanks.
>
>
>
> schema.xml
>   
>
> db-data-config.xml
>
> query="SELECT
> ID,PARENTID,ICONID,SORTORDER,MODIFIEDDATE FROM CATEGORY">
>
>
>
>
>
>
> name="catModifiedDate"/>
>
>
> WARNING: Error creating document :
>
> SolrInputDocument[{catModifiedDate=catModifiedDate(1.0)={oracle.sql.TIMESTAMP@1e58565
> },
> masterId=masterId(1.0)={124}, catParentId=catParentId(1.0)={118},
> catIconId=catIconId(1.0)={304856}}]
> org.apache.solr.common.SolrException: ERROR: [doc=124] Error adding field
> 'catModifiedDate'='oracle.sql.TIMESTAMP@1e58565'
>at
> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:324)
>at
>
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
>at
>
> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
>at
> org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:73)
>at
>
> org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:293)
>at
>
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:636)
>at
>
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268)
>at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
>at
>
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
>at
>
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
>at
>
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
> Caused by: org.apache.solr.common.SolrException: Invalid Date
> String:'oracle.sql.TIMESTAMP@1e58565'
>at org.apache.solr.schema.DateField.parseMath(DateField.java:165)
>at org.apache.solr.schema.TrieField.createField(TrieField.java:421)
>at
> org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:120)
>at
> org.apache.solr.schema.SchemaField.createField(SchemaField.java:104)
>at
> org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203)
>at
> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:281)
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrException-Invalid-Date-String-oracle-sql-TIMESTAMP-tp3654419p3654419.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Relevancy and random sorting

2012-01-12 Thread Chris Hostetter

: We have a listing aggregator that gets product listings from a lot of
: different sites and since they are added in batches, sometimes you see a
: lot of pages from the same source (site). We are working on some changes to
: shift things around and reduce this "blocking" effect, so we can present
: mixed sources on the result pages.

if the problem you are seeing is strings of docs all i na clump because 
they have the same *score* then just add a secondary sort on your random 
field - in the example you posted, you completley replace the sort by 
score with sort by random...

sort = score desc, random_1 desc

but that will only help differentiate when the scores are identical.

alternatively: you could probably use a random field in your baising 
function, although you should probably use something like the "map" or 
"scale" functions to keep it from having too much of a profound impact on 
the final score.

maybe something like...

q={!boost 
b=product(scale(random_1,1,5),recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1))}
  active:true AND featured:false +_val_:haspicture

-Hoss


Re: Question about updating index with custom field types

2012-01-12 Thread 罗赛
Hi Sylvain,

I'm very sorry that I could not help you for I'm also doing pure English
project...


Erick,

Thanks for your approach, I'll try it.

Luo Sai


On Wed, Jan 11, 2012 at 10:08 PM, Erick Erickson wrote:

> I'm not sure what custom field types have to do with XML here.
> Somewhere, you have to have defined a *field* in your schema.xml
> that references your custom type, something like:
> 
>
> then the XML is just like any other field
> 
>   56.75
> 
>
> WARNING: I don't quite know how to access the attributes
> down in your special code, I haven't had the occasion
> to actually do that so I don't know whether the attributes
> are carried down through the document parsing
>
> Best
> Erick
>
> On Tue, Jan 10, 2012 at 4:20 AM, 罗赛  wrote:
> > Hello everyone,
> >
> > I have a question on how to update index using xml messages when there
> are
> > some complex custom field types in my index...like:
> > 
> > And field offer has some attributes in it...
> >
> > I've read page, http://wiki.apache.org/solr/UpdateXmlMessages and
> example
> > shows that xml should be like:
> >
> > 
> >  
> >05991
> >Bridgewater
> >Perl
> >Java
> >  
> >  [ ... [ ... ]]
> > 
> >
> >
> > So, could u tell me how to write the XML or is there any other method to
> > update index with custom field types?
> >
> > Thanks,
> >
> > --
> > Best wishes
> >
> > Sai
>



-- 
Best wishes

罗赛

Tel 13811219876


Re: Solr 3.3 crashes after ~18 hours?

2012-01-12 Thread cowwoc
I believe this issue is related to this Jetty bug report:
https://bugs.eclipse.org/bugs/show_bug.cgi?id=357318

Gili

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-3-3-crashes-after-18-hours-tp3218496p3655937.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Stemming numbers

2012-01-12 Thread Chris Hostetter

: We've had some issues with people searching for a document with the
: search term '200 movies'. The document is actually title 'two hundred
: movies'.
: 
: Do we need to add every number to our  synonyms dictionary to
: accomplish this? Is it best done at index or search time?

if all you care about is english, there's actually an 
"English.longToEnglish" method in the lucene test-framework that was 
used to generate test corpuses back in the Lucene 1.x days .. i 
don't actaully think it's used in any Lucene tests anymore at all.

could probably whip up a filter using that in about a dozen lines of code 
... but it still wouldn't handle things like "dozen" (or "half dozen" or 
"gross") but it's there if you want to try.


-Hoss