Re: Get metadata for query

2012-10-27 Thread Lance Norskog
Nope! Each document comes back with its own list of stored fields. If you want 
to find all fields in an index, you have to fetch every last document and OR in 
the fields in that document. There is no Solr call to get a full list of static 
or dynamic fields.

If you use lots of dynamic fields I can see how this would be useful for 
pan-index tasks like assessing data quality.

- Original Message -
| From: "Jack Krupansky" 
| To: solr-user@lucene.apache.org
| Sent: Friday, October 26, 2012 7:41:58 PM
| Subject: Re: Get metadata for query
| 
| I'm not sure I understand the real question here. What is the
| "metadata".
| 
| I mean, q=x&fl=* gives you all the (stored) fields for documents
| matching
| the query.
| 
| What else is there?
| 
| -- Jack Krupansky
| 
| -Original Message-
| From: Lance Norskog
| Sent: Friday, October 26, 2012 9:42 PM
| To: solr-user@lucene.apache.org
| Subject: Re: Get metadata for query
| 
| Ah, there's the problem- what is a fast way to fetch all fields in a
| collection, including dynamic fields?
| 
| - Original Message -
| | From: "Otis Gospodnetic" 
| | To: solr-user@lucene.apache.org
| | Sent: Friday, October 26, 2012 3:05:04 PM
| | Subject: Re: Get metadata for query
| |
| | Hi,
| |
| | No... but you could simply query your index, get all the fields you
| | need and process them to get what you need.
| |
| | Otis
| | --
| | Search Analytics - http://sematext.com/search-analytics/index.html
| | Performance Monitoring - http://sematext.com/spm/index.html
| |
| |
| | On Fri, Oct 26, 2012 at 10:19 AM, Torben Honigbaum
| |  wrote:
| | > Hi everybody,
| | >
| | > with http://localhost:8983/solr/admin/luke it's possible to get
| | > metadata for all indices. But is there a way to get only the
| | > metadata for a special query? I want to query all documents which
| | > are in a special category. For the query I need the metadata
| | > containing a list of all fields of the documents.
| | >
| | > Thank you
| | > Torben
| | 
| 
| 


Re: Search and Entity structure

2012-10-27 Thread v vijith
Hi,

If I write a query like this, is there a way I can achive the results
that I need

select * from employee a left outer join qualification b on a.empid = b.empid;

This will return 5 records, 1 per employee qualification. Can this be
indexed as is?

1, John, MBA, A
1, John, Lead, B
2, George, MBA, B
2, George, PM, C
3, Viktor, null, null

How do I put the schema so that this can be searched as is?

I would not prefer to put database specific functions in my solution.

On Sat, Oct 27, 2012 at 7:08 AM, Gora Mohanty  wrote:
> On 27 October 2012 01:20, v vijith  wrote:
> [...]
>> The dataconfig file is
>> 
>> 
>> 
>> 
>> 
> [...]
>
> The SELECT in the nested entity "qualification" should fetch
> all qualifications for the given employee. How to do that is
> database dependent, e.g., one would use something like
> group_concat() in mysql. After collecting multiple qualifications
> in a single string, one can use a transformer to break the
> string at the separator used in group_concat(), and populate
> the desired Solr field with the pieces.
>
> Depending on your expertise, it might be easier to do this
> through a Solr XML document, or SolrJ.
>
> Regards,
> Gora


Re: DIH throws NullPointerException when using dataimporter.functions.escapeSql with parent entities

2012-10-27 Thread Dominik Siebel
In fact there are fields that have a NULL value but they are already
taken care of in the SQL Query like: IF(field_name IS NULL, '',
field_name).
Also it's not just single rows that fail. It's all of them.
It does not seem to have anything to do with the data that's coming
from the database. If I omit the dataimporter.functions.escapeSql the
importer at least processes those rows that don't have any SQL meta
characters in them.

Sadly, I haven't had time to put together a test to verify that the
error really emerges from the DIH itself.
Do you have any experience with bugreporting or submitting patches
(just in case there really IS a bug)?


2012/10/27 Lance Norskog :
> Which database rows cause the problem? The bug report talks about fields with 
> an empty string. Do your rows have empty string values?
>
> - Original Message -
> | From: "Dominik Siebel" 
> | To: solr-user@lucene.apache.org
> | Sent: Monday, October 22, 2012 3:15:29 AM
> | Subject: Re: DIH throws NullPointerException when using 
> dataimporter.functions.escapeSql with parent entities
> |
> | That's what I thought.
> | I'm just curious that nobody else seems to have this problem although
> | I found the exact same issue description in the issue tracker
> | (https://issues.apache.org/jira/browse/SOLR-2141) which goes back to
> | October 2010 and is flagged as "Resolved: Cannot Reproduce".
> |
> |
> | 2012/10/20 Lance Norskog :
> | > If it worked before and does not work now, I don't think you are
> | > doing anything wrong :)
> | >
> | > Do you have a different version of your JDBC driver?
> | > Can you make a unit test with a minimal DIH script and schema?
> | > Or, scan through all of the JIRA issues against the DIH from your
> | > old Solr capture date.
> | >
> | >
> | > - Original Message -
> | > | From: "Dominik Siebel" 
> | > | To: solr-user@lucene.apache.org
> | > | Sent: Thursday, October 18, 2012 11:22:54 PM
> | > | Subject: Fwd: DIH throws NullPointerException when using
> | > | dataimporter.functions.escapeSql with parent entities
> | > |
> | > | Hi folks,
> | > |
> | > | I am currently migrating our Solr servers from a 4.0.0 nightly
> | > | build
> | > | (aprox. November 2011, which worked very well) to the newly
> | > | released
> | > | 4.0.0 and am running into some issues concerning the existing
> | > | DataImportHandler configuratiions. Maybe you have an idea where I
> | > | am
> | > | going wrong here.
> | > |
> | > | The following lines are a highly simplified excerpt from one of
> | > | the
> | > | problematic imports:
> | > |
> | > | 
> | > |
> | > | 
> | > |
> | > | 
> | > |
> | > | While this configuration worked without any problem for over half
> | > | a
> | > | year now, when upgrading to 4.0.0-BETA AND 4.0.0 the Import
> | > | throws
> | > | the
> | > | followeing Stacktrace and exits:
> | > |
> | > |  SEVERE: Exception while processing: path document :
> | > | null:org.apache.solr.handler.dataimport.DataImportHandlerException:
> | > | java.lang.NullPointerException
> | > |
> | > | which is caused by
> | > |
> | > | Caused by: java.lang.NullPointerException
> | > | at
> | > | 
> org.apache.solr.handler.dataimport.EvaluatorBag$1.evaluate(EvaluatorBag.java:79)
> | > |
> | > | In other words: The EvaluatorBag doesn't seem to resolve the
> | > | given
> | > | path.name variable properly and returns null.
> | > |
> | > | Does anyone have any idea?
> | > | Appreciate your input!
> | > |
> | > | Regards
> | > | Dom
> | > |
> |


Re: Search and Entity structure

2012-10-27 Thread v vijith
Indeed , this worked .

The fix that was required was related to the how the document is
represented. It depends on the unique key. For the same unique key, it
will always update the existing document. So to avoid it, I used the
oracle sequence to identify the record, it can be oracle row number or
if none are required, it can be UUID field from SOLR.

Now Im able to search for gradename:MBA AND grade:B with the relation
maintained.

The updated dataconfig is shown below





  



Schema

   
   
   
   
   

 someid



On Sat, Oct 27, 2012 at 10:49 AM, v vijith  wrote:
> Hi,
>
> If I write a query like this, is there a way I can achive the results
> that I need
>
> select * from employee a left outer join qualification b on a.empid = b.empid;
>
> This will return 5 records, 1 per employee qualification. Can this be
> indexed as is?
>
> 1, John, MBA, A
> 1, John, Lead, B
> 2, George, MBA, B
> 2, George, PM, C
> 3, Viktor, null, null
>
> How do I put the schema so that this can be searched as is?
>
> I would not prefer to put database specific functions in my solution.
>
> On Sat, Oct 27, 2012 at 7:08 AM, Gora Mohanty  wrote:
>> On 27 October 2012 01:20, v vijith  wrote:
>> [...]
>>> The dataconfig file is
>>> 
>>> 
>>> 
>>> 
>>> 
>> [...]
>>
>> The SELECT in the nested entity "qualification" should fetch
>> all qualifications for the given employee. How to do that is
>> database dependent, e.g., one would use something like
>> group_concat() in mysql. After collecting multiple qualifications
>> in a single string, one can use a transformer to break the
>> string at the separator used in group_concat(), and populate
>> the desired Solr field with the pieces.
>>
>> Depending on your expertise, it might be easier to do this
>> through a Solr XML document, or SolrJ.
>>
>> Regards,
>> Gora


Re: Get metadata for query

2012-10-27 Thread Erik Hatcher
Lance Lance Lance :)  As the OP said, you can use /admin/luke to get all 
the fields (static and dynamic) used in the index.  I've used that trick to get 
a list of all *_facet dynamic fields to then have my UI (Blackight's first 
prototypes, aka Solr Flare) turn around and facet on them.  The request to 
/admin/luke was done once and cached.

But I think what Torben is going for is the "FieldsUsedUpdateProcessor" trick 
like .

In Solr 4 there is a JavaScript update processor example, commented out, that 
will add a field to every document containing the names of the fields 
(constrained to the name pattern of attr_* in the example) for that document.  
One can then use that to facet upon. 

In Solr 4, it's here: 


Note, the field name in a comment in there is incorrect (I'll commit a fix), 
but if you used that update processor, you could then do a query and facet on 
field attribute_ss and across that result set see what fields are contained 
within it.  I've seen this trick employed at the Smithsonian first hand, where 
there are so many different attributes across the documents that it's hard to 
know what the best facets are for the result set.

Erik


On Oct 27, 2012, at 04:09 , Lance Norskog wrote:

> Nope! Each document comes back with its own list of stored fields. If you 
> want to find all fields in an index, you have to fetch every last document 
> and OR in the fields in that document. There is no Solr call to get a full 
> list of static or dynamic fields.
> 
> If you use lots of dynamic fields I can see how this would be useful for 
> pan-index tasks like assessing data quality.
> 
> - Original Message -
> | From: "Jack Krupansky" 
> | To: solr-user@lucene.apache.org
> | Sent: Friday, October 26, 2012 7:41:58 PM
> | Subject: Re: Get metadata for query
> | 
> | I'm not sure I understand the real question here. What is the
> | "metadata".
> | 
> | I mean, q=x&fl=* gives you all the (stored) fields for documents
> | matching
> | the query.
> | 
> | What else is there?
> | 
> | -- Jack Krupansky
> | 
> | -Original Message-
> | From: Lance Norskog
> | Sent: Friday, October 26, 2012 9:42 PM
> | To: solr-user@lucene.apache.org
> | Subject: Re: Get metadata for query
> | 
> | Ah, there's the problem- what is a fast way to fetch all fields in a
> | collection, including dynamic fields?
> | 
> | - Original Message -
> | | From: "Otis Gospodnetic" 
> | | To: solr-user@lucene.apache.org
> | | Sent: Friday, October 26, 2012 3:05:04 PM
> | | Subject: Re: Get metadata for query
> | |
> | | Hi,
> | |
> | | No... but you could simply query your index, get all the fields you
> | | need and process them to get what you need.
> | |
> | | Otis
> | | --
> | | Search Analytics - http://sematext.com/search-analytics/index.html
> | | Performance Monitoring - http://sematext.com/spm/index.html
> | |
> | |
> | | On Fri, Oct 26, 2012 at 10:19 AM, Torben Honigbaum
> | |  wrote:
> | | > Hi everybody,
> | | >
> | | > with http://localhost:8983/solr/admin/luke it's possible to get
> | | > metadata for all indices. But is there a way to get only the
> | | > metadata for a special query? I want to query all documents which
> | | > are in a special category. For the query I need the metadata
> | | > containing a list of all fields of the documents.
> | | >
> | | > Thank you
> | | > Torben
> | | 
> | 
> | 



Doc Transformer to remove document from the response

2012-10-27 Thread eks dev
Transformer is great to augment Documents before shipping to response,
but what would be a way to prevent document from being delivered?

I have some search components that make some conclusions after search
, duplicates removal, clustering and one Augmenter(solr Transformer)
to shape the response up, but I need to stop some documents from being
delivered, what is the way to do it?


thanks, e.


Re: Get metadata for query

2012-10-27 Thread Lance Norskog
Erk, haven't used /luke in years. Apologies.

About that JS: does distributed search "do the right thing" when the 
distributed part is not implemented? Or does every script have to explicitly 
include distributed search support?

- Original Message -
| From: "Erik Hatcher" 
| To: solr-user@lucene.apache.org
| Sent: Saturday, October 27, 2012 4:14:12 AM
| Subject: Re: Get metadata for query
| 
| Lance Lance Lance :)  As the OP said, you can use /admin/luke to
| get all the fields (static and dynamic) used in the index.  I've
| used that trick to get a list of all *_facet dynamic fields to then
| have my UI (Blackight's first prototypes, aka Solr Flare) turn
| around and facet on them.  The request to /admin/luke was done once
| and cached.
| 
| But I think what Torben is going for is the
| "FieldsUsedUpdateProcessor" trick like
| .
| 
| In Solr 4 there is a JavaScript update processor example, commented
| out, that will add a field to every document containing the names of
| the fields (constrained to the name pattern of attr_* in the
| example) for that document.  One can then use that to facet upon.
| 
| In Solr 4, it's here:
| 

| 
| Note, the field name in a comment in there is incorrect (I'll commit
| a fix), but if you used that update processor, you could then do a
| query and facet on field attribute_ss and across that result set see
| what fields are contained within it.  I've seen this trick employed
| at the Smithsonian first hand, where there are so many different
| attributes across the documents that it's hard to know what the best
| facets are for the result set.
| 
|   Erik
| 
| 
| On Oct 27, 2012, at 04:09 , Lance Norskog wrote:
| 
| > Nope! Each document comes back with its own list of stored fields.
| > If you want to find all fields in an index, you have to fetch
| > every last document and OR in the fields in that document. There
| > is no Solr call to get a full list of static or dynamic fields.
| > 
| > If you use lots of dynamic fields I can see how this would be
| > useful for pan-index tasks like assessing data quality.
| > 
| > - Original Message -
| > | From: "Jack Krupansky" 
| > | To: solr-user@lucene.apache.org
| > | Sent: Friday, October 26, 2012 7:41:58 PM
| > | Subject: Re: Get metadata for query
| > | 
| > | I'm not sure I understand the real question here. What is the
| > | "metadata".
| > | 
| > | I mean, q=x&fl=* gives you all the (stored) fields for documents
| > | matching
| > | the query.
| > | 
| > | What else is there?
| > | 
| > | -- Jack Krupansky
| > | 
| > | -Original Message-
| > | From: Lance Norskog
| > | Sent: Friday, October 26, 2012 9:42 PM
| > | To: solr-user@lucene.apache.org
| > | Subject: Re: Get metadata for query
| > | 
| > | Ah, there's the problem- what is a fast way to fetch all fields
| > | in a
| > | collection, including dynamic fields?
| > | 
| > | - Original Message -
| > | | From: "Otis Gospodnetic" 
| > | | To: solr-user@lucene.apache.org
| > | | Sent: Friday, October 26, 2012 3:05:04 PM
| > | | Subject: Re: Get metadata for query
| > | |
| > | | Hi,
| > | |
| > | | No... but you could simply query your index, get all the fields
| > | | you
| > | | need and process them to get what you need.
| > | |
| > | | Otis
| > | | --
| > | | Search Analytics -
| > | | http://sematext.com/search-analytics/index.html
| > | | Performance Monitoring - http://sematext.com/spm/index.html
| > | |
| > | |
| > | | On Fri, Oct 26, 2012 at 10:19 AM, Torben Honigbaum
| > | |  wrote:
| > | | > Hi everybody,
| > | | >
| > | | > with http://localhost:8983/solr/admin/luke it's possible to
| > | | > get
| > | | > metadata for all indices. But is there a way to get only the
| > | | > metadata for a special query? I want to query all documents
| > | | > which
| > | | > are in a special category. For the query I need the metadata
| > | | > containing a list of all fields of the documents.
| > | | >
| > | | > Thank you
| > | | > Torben
| > | | 
| > | 
| > | 
| 
| 


Re: lukeall.jar for Solr4r?

2012-10-27 Thread Lance Norskog
Aha! Andrzej has not built a 4.0 release version. You need to check out the 
source and compile your own.

http://code.google.com/p/luke/downloads/list

- Original Message -
| From: "Carrie Coy" 
| To: solr-user@lucene.apache.org
| Sent: Friday, October 26, 2012 7:33:45 AM
| Subject: lukeall.jar for Solr4r?
| 
| Where can I get a copy of Luke capable of reading Solr4 indexes?  My
| lukeall-4.0.0-ALPHA.jar no longer works.
| 
| Thx,
| Carrie Coy
| 


Re: Get metadata for query

2012-10-27 Thread Erik Hatcher
Distributed *search*?   It'll do the right thing as this is an update 
processor, and only invoked during indexing.  Maybe you meant distributed 
indexing, ala SolrCloud it should also work fine, just like any other 
straightforward update processor that adds/updates/removes fields from incoming 
documents.

Erk (lol!)

On Oct 27, 2012, at 21:17 , Lance Norskog wrote:

> Erk, haven't used /luke in years. Apologies.
> 
> About that JS: does distributed search "do the right thing" when the 
> distributed part is not implemented? Or does every script have to explicitly 
> include distributed search support?
> 
> - Original Message -
> | From: "Erik Hatcher" 
> | To: solr-user@lucene.apache.org
> | Sent: Saturday, October 27, 2012 4:14:12 AM
> | Subject: Re: Get metadata for query
> | 
> | Lance Lance Lance :)  As the OP said, you can use /admin/luke to
> | get all the fields (static and dynamic) used in the index.  I've
> | used that trick to get a list of all *_facet dynamic fields to then
> | have my UI (Blackight's first prototypes, aka Solr Flare) turn
> | around and facet on them.  The request to /admin/luke was done once
> | and cached.
> | 
> | But I think what Torben is going for is the
> | "FieldsUsedUpdateProcessor" trick like
> | .
> | 
> | In Solr 4 there is a JavaScript update processor example, commented
> | out, that will add a field to every document containing the names of
> | the fields (constrained to the name pattern of attr_* in the
> | example) for that document.  One can then use that to facet upon.
> | 
> | In Solr 4, it's here:
> | 
> 
> | 
> | Note, the field name in a comment in there is incorrect (I'll commit
> | a fix), but if you used that update processor, you could then do a
> | query and facet on field attribute_ss and across that result set see
> | what fields are contained within it.  I've seen this trick employed
> | at the Smithsonian first hand, where there are so many different
> | attributes across the documents that it's hard to know what the best
> | facets are for the result set.
> | 
> | Erik
> | 
> | 
> | On Oct 27, 2012, at 04:09 , Lance Norskog wrote:
> | 
> | > Nope! Each document comes back with its own list of stored fields.
> | > If you want to find all fields in an index, you have to fetch
> | > every last document and OR in the fields in that document. There
> | > is no Solr call to get a full list of static or dynamic fields.
> | > 
> | > If you use lots of dynamic fields I can see how this would be
> | > useful for pan-index tasks like assessing data quality.
> | > 
> | > - Original Message -
> | > | From: "Jack Krupansky" 
> | > | To: solr-user@lucene.apache.org
> | > | Sent: Friday, October 26, 2012 7:41:58 PM
> | > | Subject: Re: Get metadata for query
> | > | 
> | > | I'm not sure I understand the real question here. What is the
> | > | "metadata".
> | > | 
> | > | I mean, q=x&fl=* gives you all the (stored) fields for documents
> | > | matching
> | > | the query.
> | > | 
> | > | What else is there?
> | > | 
> | > | -- Jack Krupansky
> | > | 
> | > | -Original Message-
> | > | From: Lance Norskog
> | > | Sent: Friday, October 26, 2012 9:42 PM
> | > | To: solr-user@lucene.apache.org
> | > | Subject: Re: Get metadata for query
> | > | 
> | > | Ah, there's the problem- what is a fast way to fetch all fields
> | > | in a
> | > | collection, including dynamic fields?
> | > | 
> | > | - Original Message -
> | > | | From: "Otis Gospodnetic" 
> | > | | To: solr-user@lucene.apache.org
> | > | | Sent: Friday, October 26, 2012 3:05:04 PM
> | > | | Subject: Re: Get metadata for query
> | > | |
> | > | | Hi,
> | > | |
> | > | | No... but you could simply query your index, get all the fields
> | > | | you
> | > | | need and process them to get what you need.
> | > | |
> | > | | Otis
> | > | | --
> | > | | Search Analytics -
> | > | | http://sematext.com/search-analytics/index.html
> | > | | Performance Monitoring - http://sematext.com/spm/index.html
> | > | |
> | > | |
> | > | | On Fri, Oct 26, 2012 at 10:19 AM, Torben Honigbaum
> | > | |  wrote:
> | > | | > Hi everybody,
> | > | | >
> | > | | > with http://localhost:8983/solr/admin/luke it's possible to
> | > | | > get
> | > | | > metadata for all indices. But is there a way to get only the
> | > | | > metadata for a special query? I want to query all documents
> | > | | > which
> | > | | > are in a special category. For the query I need the metadata
> | > | | > containing a list of all fields of the documents.
> | > | | >
> | > | | > Thank you
> | > | | > Torben
> | > | | 
> | > | 
> | > | 
> | 
> | 



Re: throttle segment merging

2012-10-27 Thread Radim Kolar

Dne 26.10.2012 3:47, Tomás Fernández Löbbe napsal(a):

Is there way to set-up logging to output something when segment merging
runs?


I think segment merging is logged when you enable infoStream logging (you
should see it commented in the solrconfig.xml)
no, segment merging is not logged at info level. it needs customized log 
config.





Can be segment merges throttled?
> You can change when and how segments are merged with the merge 
policy, maybe it's enough for you changing the initial settings 
(mergeFactor for example)?


I am now researching elasticsearch, it can do it, its lucene 3.6 based