Re: SolrCloud removing shard (how to not loose data)

2013-01-11 Thread mizayah
Mark, I know i still have access to data and i can woke ap shard again.

What i want to do is.


I have 3 shards on 3 nodes, one on each. Now i discower that i dont need 3
nodes and i want only 2.
So i want to remove shard and put data from it to these who left.

Is there way to index that data without force index it again?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-removing-shard-how-to-not-loose-data-tp4032138p4032459.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: retrieving latest document **only**

2013-01-11 Thread Uwe Reh

Am 10.01.2013 11:54, schrieb jmozah:

I need a query that matches only the most recent ones...
Because my stats depend on it..

But I have a requirement to show **only** the latest documents and the
"stats" along with it..


What do you want?
'the most recent ones' or '**only** the latest' ?

Perhaps a range query "q=timestamp:[refdate TO NOW]" will match your needs.

Uwe



Forwarding authentication credentials in internal node-to-node requests

2013-01-11 Thread Per Steffensen

Hi

I read http://wiki.apache.org/solr/SolrSecurity and know a lot about 
webcontainer authentication and authorization. Im sure I will be able to 
set it up so that each solr-node is will require HTTP authentication for 
(selected) incoming requests.


But solr-nodes also make requests among each other and Im in doubt if 
credentials are forwarded from the "original request" to the internal 
sub-requests?
E.g. lets say that each solr-node is set up to require authentication 
for search request. An "outside" user makes a distributed request 
including correct username/password. Since it is a distributed search, 
the node which handles the original request from the user will have to 
make sub-requests to other solr-nodes but they also require correct 
credentials in order to accept this sub-request. Are the credentials 
from the original request duplicated to the sub-requests or what options 
do I have?
Same thing goes for e.g. update requests if they are sent to a node 
which does not run (all) the replica of the shard in which the documents 
to be added/updated/deleted belong. The node needs to make sub-request 
to other nodes, and it will require forwarding the credentials.


Does this just work out of the box, or ... ?

Regards, Per Steffensen


Re: Auto completion

2013-01-11 Thread anurag.jain
in solrconfig.xml 

 
   edismax
   
  text^0.5 last_name^1.0 first_name^1.2 course_name^7.0 id^10.0
branch_name^1.1 hq_passout_year^1.4
  course_type^10.0 institute_name^5.0 qualification_type^5.0
mail^2.0 state_name^1.0
   
   text
   100%
   *:*
   10
   *,score

   
 text^0.5 last_name^1.0 first_name^1.2 course_name^7.0 id^10.0
branch_name^1.1 hq_passout_year^1.4
  course_type^10.0 institute_name^5.0 qualification_type^5.0
mail^2.0 state_name^1.0
   
   text,last_name,first_name,course_name,id,branch_name,hq_passout_year,course_type,institute_name,qualification_type,mail,state_name
   3

   
   on
   is_top_institute
   course_name
  
   cgpa
   0
   10
   2




and in schema.xml




   
   
   
...
...
...




 
 ...
 ...
 ...


so please now tell me what will be JavaScript (terms.fl parameter) ? and
conf/velocity/head.vm, and also the 'name' reference in suggest.vm. 


please reply .. and thanks for previous reply ..  :-)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-completion-tp4032267p4032450.html
Sent from the Solr - User mailing list archive at Nabble.com.


which way for "export"

2013-01-11 Thread stockii
hello.

Which is the best/fastest way to get the value of many fields from index?

My problem is, that i need to calculate a sum of amounts. this amount is in
my index (stored="true"). my php script get all values with paging. but if a
request takes too long, jetty is killing this process of "export".

is it better to get all the fields with "wt=csv/json/xml" or something other
handler?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/which-way-for-export-tp4032487.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Forwarding authentication credentials in internal node-to-node requests

2013-01-11 Thread Markus Jelsma
Hi,

If your credentials are fixed i would configure username:password in your 
request handler's shardHandlerFactory configuration section and then modify 
HttpShardHandlerFactory.init() to create a HttpClient with an AuthScope 
configured with those settings.

I don't think you can obtain the original credentials very easy when inside 
HttpShardHandlerFactory.

Cheers 
 
-Original message-
> From:Per Steffensen 
> Sent: Fri 11-Jan-2013 13:07
> To: solr-user@lucene.apache.org
> Subject: Forwarding authentication credentials in internal node-to-node 
> requests
> 
> Hi
> 
> I read http://wiki.apache.org/solr/SolrSecurity and know a lot about 
> webcontainer authentication and authorization. Im sure I will be able to 
> set it up so that each solr-node is will require HTTP authentication for 
> (selected) incoming requests.
> 
> But solr-nodes also make requests among each other and Im in doubt if 
> credentials are forwarded from the "original request" to the internal 
> sub-requests?
> E.g. lets say that each solr-node is set up to require authentication 
> for search request. An "outside" user makes a distributed request 
> including correct username/password. Since it is a distributed search, 
> the node which handles the original request from the user will have to 
> make sub-requests to other solr-nodes but they also require correct 
> credentials in order to accept this sub-request. Are the credentials 
> from the original request duplicated to the sub-requests or what options 
> do I have?
> Same thing goes for e.g. update requests if they are sent to a node 
> which does not run (all) the replica of the shard in which the documents 
> to be added/updated/deleted belong. The node needs to make sub-request 
> to other nodes, and it will require forwarding the credentials.
> 
> Does this just work out of the box, or ... ?
> 
> Regards, Per Steffensen
> 


Re: retrieving latest document **only**

2013-01-11 Thread jmozah



> What do you want?
> 'the most recent ones' or '**only** the latest' ?
> 
> Perhaps a range query "q=timestamp:[refdate TO NOW]" will match your needs.
> 
> Uwe
> 


I need **only** the latest documents...
in the above query , "refdate" can vary based on the query.

./zahoor





Re: retrieving latest document **only**

2013-01-11 Thread jmozah
one crude way is first query and pick the latest date from the result
then issue a query with q=timestamp[latestDate TO latestDate]

But i dont want to execute two queries...

./zahoor

On 11-Jan-2013, at 6:37 PM, jmozah  wrote:

> 
> 
> 
>> What do you want?
>> 'the most recent ones' or '**only** the latest' ?
>> 
>> Perhaps a range query "q=timestamp:[refdate TO NOW]" will match your needs.
>> 
>> Uwe
>> 
> 
> 
> I need **only** the latest documents...
> in the above query , "refdate" can vary based on the query.
> 
> ./zahoor
> 
> 
> 



Re: retrieving latest document **only**

2013-01-11 Thread Upayavira
could you use field collapsing? Boost by date and only show one value
per group, and you'll have the most recent document only.

Upayavira

On Fri, Jan 11, 2013, at 01:10 PM, jmozah wrote:
> one crude way is first query and pick the latest date from the result
> then issue a query with q=timestamp[latestDate TO latestDate]
> 
> But i dont want to execute two queries...
> 
> ./zahoor
> 
> On 11-Jan-2013, at 6:37 PM, jmozah  wrote:
> 
> > 
> > 
> > 
> >> What do you want?
> >> 'the most recent ones' or '**only** the latest' ?
> >> 
> >> Perhaps a range query "q=timestamp:[refdate TO NOW]" will match your needs.
> >> 
> >> Uwe
> >> 
> > 
> > 
> > I need **only** the latest documents...
> > in the above query , "refdate" can vary based on the query.
> > 
> > ./zahoor
> > 
> > 
> > 
> 


configuring schema to match database

2013-01-11 Thread Niklas Langvig
Hi!
I'm quite new to solr and trying to understand how to create a schema from how 
our postgres database and then search for the content in solr instead of 
querying the db.

My question should be really easy, it has most likely been asked many times but 
still I'm not able to google any answer to it.

To make it easy, I have 3 columns: users, courses and languages

Users has columns , userid, firstname, lastname
Courses has column coursename, startdate, enddate
Languages has column language, writingskill, verbalskill

UserA has taken courseA, courseB and courseC and has writingskill good 
verbalskill good for english and writingskill excellent verbalskill excellent 
for spanish
UserB has taken courseA, courseF, courseG and courseH and has writingskill 
fluent verbalskill fluent for english and writingskill good verbalskill good 
for italian

I would like to put this data into solr so I can search for all "users how have 
taken courseA and are fluent in english".
Can I do that?

The problem is I'm not sure how to flatten this database into a schema
It's easy to understand the users column, for example




But then I'm not so sure how the schema should look like for courses and 
languages






Thanks for any help
/Niklas


Re: SolrCloud removing shard (how to not loose data)

2013-01-11 Thread mizayah
Seams I'm to lazy.
I found this http://wiki.apache.org/solr/MergingSolrIndexes, and it works
rly.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-removing-shard-how-to-not-loose-data-tp4032138p4032508.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr 4.0, slow opening searchers

2013-01-11 Thread Marcel Bremer
Hi,

We're experiencing slow startup times of searchers in Solr when containing a 
large number of documents.

We use Solr v4.0 with Jetty and currently have 267.657.634 documents stored, 
spread across 9 cores. These documents contain keywords, with additional 
statistics, which we are using for suggestions and related keywords. When we 
(re)start Solr on one of our servers it can take up to two hours before Solr 
has opened all of it's searchers and starts accepting connections again. We 
can't figure out why it takes so long to open those searchers. Also the CPU and 
memory usage of Solr while opening searchers is not extremely high.

Are there any known issues or tips someone could give us to speed up opening 
searchers?

If you need more details, please ping me.


Best regards,

Marcel Bremer
Vinden.nl BV


Re: Index data from multiple tables into Solr

2013-01-11 Thread Dariusz Borowski
Hi!

I know the pain! ;)

That's why I wrote a bit on a blog, so I could remember in the future. Here
is the link in case you would like to read a tutorial how to setup SOLR w/
multicore and hook it up to the database:

http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1/

I hope it helps!
D.



On Thu, Jan 10, 2013 at 6:19 PM, hassancrowdc wrote:

> Hi,
> i am trying to index multiple tables in solr. I am not sure which data
> config file to be changed there are so many of them(like solr-data-config,
> db-data-config)?
>
> Also, do i have to change the id, name and desc to the name of the columns
> in my table? and
>
> how do i add solr_details field in schema?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: configuring schema to match database

2013-01-11 Thread Dariusz Borowski
Hi Niklas,

Maybe this link helps:

http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1/

D.



On Fri, Jan 11, 2013 at 2:19 PM, Niklas Langvig <
niklas.lang...@globesoft.com> wrote:

> Hi!
> I'm quite new to solr and trying to understand how to create a schema from
> how our postgres database and then search for the content in solr instead
> of querying the db.
>
> My question should be really easy, it has most likely been asked many
> times but still I'm not able to google any answer to it.
>
> To make it easy, I have 3 columns: users, courses and languages
>
> Users has columns , userid, firstname, lastname
> Courses has column coursename, startdate, enddate
> Languages has column language, writingskill, verbalskill
>
> UserA has taken courseA, courseB and courseC and has writingskill good
> verbalskill good for english and writingskill excellent verbalskill
> excellent for spanish
> UserB has taken courseA, courseF, courseG and courseH and has writingskill
> fluent verbalskill fluent for english and writingskill good verbalskill
> good for italian
>
> I would like to put this data into solr so I can search for all "users how
> have taken courseA and are fluent in english".
> Can I do that?
>
> The problem is I'm not sure how to flatten this database into a schema
> It's easy to understand the users column, for example
> 
> 
> 
>
> But then I'm not so sure how the schema should look like for courses and
> languages
> 
> 
> 
> 
>
>
> Thanks for any help
> /Niklas
>


SV: configuring schema to match database

2013-01-11 Thread Niklas Langvig
When thinkting some more,
Perhaps I could have coursename and such as multivalue?

Or should I have separate indeces for users, courses and languages?

I get the feeling both would work, but now sure which way is the best to go.

When a user is updating/removing/adding a course it would be nice to to have to 
query the database for users courses and languages and update everything but 
just update a course document
But perhaps I'm thinking to much in database terms?

But still I'm unsure how the schema should look like

Thanks
/Niklas

-Ursprungligt meddelande-
Från: Niklas Langvig [mailto:niklas.lang...@globesoft.com] 
Skickat: den 11 januari 2013 14:19
Till: solr-user@lucene.apache.org
Ämne: configuring schema to match database

Hi!
I'm quite new to solr and trying to understand how to create a schema from how 
our postgres database and then search for the content in solr instead of 
querying the db.

My question should be really easy, it has most likely been asked many times but 
still I'm not able to google any answer to it.

To make it easy, I have 3 columns: users, courses and languages

Users has columns , userid, firstname, lastname Courses has column coursename, 
startdate, enddate Languages has column language, writingskill, verbalskill

UserA has taken courseA, courseB and courseC and has writingskill good 
verbalskill good for english and writingskill excellent verbalskill excellent 
for spanish UserB has taken courseA, courseF, courseG and courseH and has 
writingskill fluent verbalskill fluent for english and writingskill good 
verbalskill good for italian

I would like to put this data into solr so I can search for all "users how have 
taken courseA and are fluent in english".
Can I do that?

The problem is I'm not sure how to flatten this database into a schema It's 
easy to understand the users column, for example   

But then I'm not so sure how the schema should look like for courses and 
languages


Thanks for any help
/Niklas


SV: configuring schema to match database

2013-01-11 Thread Niklas Langvig
Hmm noticed I wrote I have 3 columns: users, courses and languages
I ofcourse mean I have 3 tables: users, courses and languages

/Niklas

-Ursprungligt meddelande-
Från: Niklas Langvig [mailto:niklas.lang...@globesoft.com] 
Skickat: den 11 januari 2013 14:19
Till: solr-user@lucene.apache.org
Ämne: configuring schema to match database

Hi!
I'm quite new to solr and trying to understand how to create a schema from how 
our postgres database and then search for the content in solr instead of 
querying the db.

My question should be really easy, it has most likely been asked many times but 
still I'm not able to google any answer to it.

To make it easy, I have 3 columns: users, courses and languages

Users has columns , userid, firstname, lastname Courses has column coursename, 
startdate, enddate Languages has column language, writingskill, verbalskill

UserA has taken courseA, courseB and courseC and has writingskill good 
verbalskill good for english and writingskill excellent verbalskill excellent 
for spanish UserB has taken courseA, courseF, courseG and courseH and has 
writingskill fluent verbalskill fluent for english and writingskill good 
verbalskill good for italian

I would like to put this data into solr so I can search for all "users how have 
taken courseA and are fluent in english".
Can I do that?

The problem is I'm not sure how to flatten this database into a schema It's 
easy to understand the users column, for example   

But then I'm not so sure how the schema should look like for courses and 
languages


Thanks for any help
/Niklas


SV: configuring schema to match database

2013-01-11 Thread Niklas Langvig
Hi Dariusz,
To me this  example has one table "user" and I have many tables that connects 
to one user and that is what I'm unsure how how to do.

/Niklas


-Ursprungligt meddelande-
Från: Dariusz Borowski [mailto:darius...@gmail.com] 
Skickat: den 11 januari 2013 14:56
Till: solr-user@lucene.apache.org
Ämne: Re: configuring schema to match database

Hi Niklas,

Maybe this link helps:

http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1/

D.



On Fri, Jan 11, 2013 at 2:19 PM, Niklas Langvig < niklas.lang...@globesoft.com> 
wrote:

> Hi!
> I'm quite new to solr and trying to understand how to create a schema 
> from how our postgres database and then search for the content in solr 
> instead of querying the db.
>
> My question should be really easy, it has most likely been asked many 
> times but still I'm not able to google any answer to it.
>
> To make it easy, I have 3 columns: users, courses and languages
>
> Users has columns , userid, firstname, lastname Courses has column 
> coursename, startdate, enddate Languages has column language, 
> writingskill, verbalskill
>
> UserA has taken courseA, courseB and courseC and has writingskill good 
> verbalskill good for english and writingskill excellent verbalskill 
> excellent for spanish UserB has taken courseA, courseF, courseG and 
> courseH and has writingskill fluent verbalskill fluent for english and 
> writingskill good verbalskill good for italian
>
> I would like to put this data into solr so I can search for all "users 
> how have taken courseA and are fluent in english".
> Can I do that?
>
> The problem is I'm not sure how to flatten this database into a schema 
> It's easy to understand the users column, for example  name="userid" type="string" indexed="true" />  type="string" indexed="true" />  indexed="true" />
>
> But then I'm not so sure how the schema should look like for courses 
> and languages  
>   name="startdate" type="string" indexed="true" />  type="string" indexed="true" />
>
>
> Thanks for any help
> /Niklas
>


Re: configuring schema to match database

2013-01-11 Thread Dariusz Borowski
Hi,

No, it has actually two tables. User and Item. The example shown on the
blog is for one table, because you repeat the same thing for the other
table. Only your data-import.xml file changes. For the rest, just copy and
paste it in the conf directory. If you are running your solr in Linux, then
you can work with symlinks.

D.



On Fri, Jan 11, 2013 at 3:12 PM, Niklas Langvig <
niklas.lang...@globesoft.com> wrote:

> Hi Dariusz,
> To me this  example has one table "user" and I have many tables that
> connects to one user and that is what I'm unsure how how to do.
>
> /Niklas
>
>
> -Ursprungligt meddelande-
> Från: Dariusz Borowski [mailto:darius...@gmail.com]
> Skickat: den 11 januari 2013 14:56
> Till: solr-user@lucene.apache.org
> Ämne: Re: configuring schema to match database
>
> Hi Niklas,
>
> Maybe this link helps:
>
> http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1/
>
> D.
>
>
>
> On Fri, Jan 11, 2013 at 2:19 PM, Niklas Langvig <
> niklas.lang...@globesoft.com> wrote:
>
> > Hi!
> > I'm quite new to solr and trying to understand how to create a schema
> > from how our postgres database and then search for the content in solr
> > instead of querying the db.
> >
> > My question should be really easy, it has most likely been asked many
> > times but still I'm not able to google any answer to it.
> >
> > To make it easy, I have 3 columns: users, courses and languages
> >
> > Users has columns , userid, firstname, lastname Courses has column
> > coursename, startdate, enddate Languages has column language,
> > writingskill, verbalskill
> >
> > UserA has taken courseA, courseB and courseC and has writingskill good
> > verbalskill good for english and writingskill excellent verbalskill
> > excellent for spanish UserB has taken courseA, courseF, courseG and
> > courseH and has writingskill fluent verbalskill fluent for english and
> > writingskill good verbalskill good for italian
> >
> > I would like to put this data into solr so I can search for all "users
> > how have taken courseA and are fluent in english".
> > Can I do that?
> >
> > The problem is I'm not sure how to flatten this database into a schema
> > It's easy to understand the users column, for example  > name="userid" type="string" indexed="true" />  > type="string" indexed="true" />  > indexed="true" />
> >
> > But then I'm not so sure how the schema should look like for courses
> > and languages 
> >   > name="startdate" type="string" indexed="true" />  > type="string" indexed="true" />
> >
> >
> > Thanks for any help
> > /Niklas
> >
>


Re: Forwarding authentication credentials in internal node-to-node requests

2013-01-11 Thread Per Steffensen
Hmmm, it will not work for me. I want the "original" credential 
forwarded in the sub-requests. The credentials are mapped to permissions 
(authorization), and basically I dont want a user to be able have 
something done in the (automatically performed by the contacted 
solr-node) sub-requests that he is not authorized to do. Forward of 
credentials is a must. So what you are saying is that I should expect to 
have to do some modifications to Solr in order to achieve what I want?


Regards, Per Steffensen

On 1/11/13 2:11 PM, Markus Jelsma wrote:

Hi,

If your credentials are fixed i would configure username:password in your 
request handler's shardHandlerFactory configuration section and then modify 
HttpShardHandlerFactory.init() to create a HttpClient with an AuthScope 
configured with those settings.

I don't think you can obtain the original credentials very easy when inside 
HttpShardHandlerFactory.

Cheers
  
-Original message-

From:Per Steffensen 
Sent: Fri 11-Jan-2013 13:07
To: solr-user@lucene.apache.org
Subject: Forwarding authentication credentials in internal node-to-node requests

Hi

I read http://wiki.apache.org/solr/SolrSecurity and know a lot about
webcontainer authentication and authorization. Im sure I will be able to
set it up so that each solr-node is will require HTTP authentication for
(selected) incoming requests.

But solr-nodes also make requests among each other and Im in doubt if
credentials are forwarded from the "original request" to the internal
sub-requests?
E.g. lets say that each solr-node is set up to require authentication
for search request. An "outside" user makes a distributed request
including correct username/password. Since it is a distributed search,
the node which handles the original request from the user will have to
make sub-requests to other solr-nodes but they also require correct
credentials in order to accept this sub-request. Are the credentials
from the original request duplicated to the sub-requests or what options
do I have?
Same thing goes for e.g. update requests if they are sent to a node
which does not run (all) the replica of the shard in which the documents
to be added/updated/deleted belong. The node needs to make sub-request
to other nodes, and it will require forwarding the credentials.

Does this just work out of the box, or ... ?

Regards, Per Steffensen





Re: Reading properties in data-import.xml

2013-01-11 Thread Dariusz Borowski
Thanks Alex!

This brought me to the solution I wanted to achieve. :)

D.



On Thu, Jan 10, 2013 at 3:21 PM, Alexandre Rafalovitch
wrote:

> dataimport.properties is for DIH to store it's own properties for delta
> processing and things. Try solrcore.properties instead, as per recent
> discussion:
>
> http://lucene.472066.n3.nabble.com/Reading-database-connection-properties-from-external-file-td4031154.html
>
> Regards,
>Alex.
>
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>
>
> On Thu, Jan 10, 2013 at 3:58 AM, Dariusz Borowski  >wrote:
>
> > I'm having a problem using a property file in my data-import.xml file.
> >
> > My aim is to not hard code some values inside my xml file, but rather
> > reusing the values from a property file. I'm using multicore and some of
> > the values are being changed from time to time and I do not want to
> change
> > them in all my data-import files.
> >
> > For example:
> >
> >  > type="JdbcDataSource"
> > driver="com.mysql.jdbc.Driver"
> > url="jdbc:mysql://${host}:3306/projectX"
> > user="${username}"
> > password="${password}" />
> >
> > I tried everything, but don't know how I can use proporties here. I tried
> > to put my values in dataimport.properties, located under "SOLR-HOME/conf"
> > and under "SOLR-HOME/core1/conf", but without any success.
> >
> > Please, could someone help me on this?
> >
>


SV: configuring schema to match database

2013-01-11 Thread Niklas Langvig
Ahh sorry,
Now I understand,
Ok seems like a good solution, I just know need to understand how to query 
multiple cores now :)

-Ursprungligt meddelande-
Från: Dariusz Borowski [mailto:darius...@gmail.com] 
Skickat: den 11 januari 2013 15:15
Till: solr-user@lucene.apache.org
Ämne: Re: configuring schema to match database

Hi,

No, it has actually two tables. User and Item. The example shown on the blog is 
for one table, because you repeat the same thing for the other table. Only your 
data-import.xml file changes. For the rest, just copy and paste it in the conf 
directory. If you are running your solr in Linux, then you can work with 
symlinks.

D.



On Fri, Jan 11, 2013 at 3:12 PM, Niklas Langvig < niklas.lang...@globesoft.com> 
wrote:

> Hi Dariusz,
> To me this  example has one table "user" and I have many tables that 
> connects to one user and that is what I'm unsure how how to do.
>
> /Niklas
>
>
> -Ursprungligt meddelande-
> Från: Dariusz Borowski [mailto:darius...@gmail.com]
> Skickat: den 11 januari 2013 14:56
> Till: solr-user@lucene.apache.org
> Ämne: Re: configuring schema to match database
>
> Hi Niklas,
>
> Maybe this link helps:
>
> http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1
> /
>
> D.
>
>
>
> On Fri, Jan 11, 2013 at 2:19 PM, Niklas Langvig < 
> niklas.lang...@globesoft.com> wrote:
>
> > Hi!
> > I'm quite new to solr and trying to understand how to create a 
> > schema from how our postgres database and then search for the 
> > content in solr instead of querying the db.
> >
> > My question should be really easy, it has most likely been asked 
> > many times but still I'm not able to google any answer to it.
> >
> > To make it easy, I have 3 columns: users, courses and languages
> >
> > Users has columns , userid, firstname, lastname Courses has column 
> > coursename, startdate, enddate Languages has column language, 
> > writingskill, verbalskill
> >
> > UserA has taken courseA, courseB and courseC and has writingskill 
> > good verbalskill good for english and writingskill excellent 
> > verbalskill excellent for spanish UserB has taken courseA, courseF, 
> > courseG and courseH and has writingskill fluent verbalskill fluent 
> > for english and writingskill good verbalskill good for italian
> >
> > I would like to put this data into solr so I can search for all 
> > "users how have taken courseA and are fluent in english".
> > Can I do that?
> >
> > The problem is I'm not sure how to flatten this database into a 
> > schema It's easy to understand the users column, for example  > name="userid" type="string" indexed="true" />  > type="string" indexed="true" />  > indexed="true" />
> >
> > But then I'm not so sure how the schema should look like for courses 
> > and languages  
> >   > name="startdate" type="string" indexed="true" />  > type="string" indexed="true" />
> >
> >
> > Thanks for any help
> > /Niklas
> >
>


Re: configuring schema to match database

2013-01-11 Thread Dariusz Borowski
I don't know how to query multiple cores and if it's possible at once, but
otherwise I would create a JOIN sql script if you need values from multiple
tables.

D.



On Fri, Jan 11, 2013 at 3:27 PM, Niklas Langvig <
niklas.lang...@globesoft.com> wrote:

> Ahh sorry,
> Now I understand,
> Ok seems like a good solution, I just know need to understand how to query
> multiple cores now :)
>
> -Ursprungligt meddelande-
> Från: Dariusz Borowski [mailto:darius...@gmail.com]
> Skickat: den 11 januari 2013 15:15
> Till: solr-user@lucene.apache.org
> Ämne: Re: configuring schema to match database
>
> Hi,
>
> No, it has actually two tables. User and Item. The example shown on the
> blog is for one table, because you repeat the same thing for the other
> table. Only your data-import.xml file changes. For the rest, just copy and
> paste it in the conf directory. If you are running your solr in Linux, then
> you can work with symlinks.
>
> D.
>
>
>
> On Fri, Jan 11, 2013 at 3:12 PM, Niklas Langvig <
> niklas.lang...@globesoft.com> wrote:
>
> > Hi Dariusz,
> > To me this  example has one table "user" and I have many tables that
> > connects to one user and that is what I'm unsure how how to do.
> >
> > /Niklas
> >
> >
> > -Ursprungligt meddelande-
> > Från: Dariusz Borowski [mailto:darius...@gmail.com]
> > Skickat: den 11 januari 2013 14:56
> > Till: solr-user@lucene.apache.org
> > Ämne: Re: configuring schema to match database
> >
> > Hi Niklas,
> >
> > Maybe this link helps:
> >
> > http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1
> > /
> >
> > D.
> >
> >
> >
> > On Fri, Jan 11, 2013 at 2:19 PM, Niklas Langvig <
> > niklas.lang...@globesoft.com> wrote:
> >
> > > Hi!
> > > I'm quite new to solr and trying to understand how to create a
> > > schema from how our postgres database and then search for the
> > > content in solr instead of querying the db.
> > >
> > > My question should be really easy, it has most likely been asked
> > > many times but still I'm not able to google any answer to it.
> > >
> > > To make it easy, I have 3 columns: users, courses and languages
> > >
> > > Users has columns , userid, firstname, lastname Courses has column
> > > coursename, startdate, enddate Languages has column language,
> > > writingskill, verbalskill
> > >
> > > UserA has taken courseA, courseB and courseC and has writingskill
> > > good verbalskill good for english and writingskill excellent
> > > verbalskill excellent for spanish UserB has taken courseA, courseF,
> > > courseG and courseH and has writingskill fluent verbalskill fluent
> > > for english and writingskill good verbalskill good for italian
> > >
> > > I would like to put this data into solr so I can search for all
> > > "users how have taken courseA and are fluent in english".
> > > Can I do that?
> > >
> > > The problem is I'm not sure how to flatten this database into a
> > > schema It's easy to understand the users column, for example  > > name="userid" type="string" indexed="true" />  > > type="string" indexed="true" />  > > indexed="true" />
> > >
> > > But then I'm not so sure how the schema should look like for courses
> > > and languages 
> > >   > > name="startdate" type="string" indexed="true" />  > > type="string" indexed="true" />
> > >
> > >
> > > Thanks for any help
> > > /Niklas
> > >
> >
>


Re: configuring schema to match database

2013-01-11 Thread Gora Mohanty
On 11 January 2013 19:57, Niklas Langvig  wrote:
> Ahh sorry,
> Now I understand,
> Ok seems like a good solution, I just know need to understand how to query 
> multiple cores now :)

There is no need to use multiple cores in your setup. Going
back to your original problem statement, it can easily be
handled with a single core, and it actually makes more sense
to do it that way. You will need to give us more details.

>> > My question should be really easy, it has most likely been asked
>> > many times but still I'm not able to google any answer to it.
>> >
>> > To make it easy, I have 3 columns: users, courses and languages

Presumably, you mean three tables, as you describe each as
having columns. How are the tables connected? Is there a
foreign key relationship between them? Is the relationship
one-to-one, one-to-many, or what?

>> > Users has columns , userid, firstname, lastname Courses has column
>> > coursename, startdate, enddate Languages has column language,
>> > writingskill, verbalskill
[...]
>> > I would like to put this data into solr so I can search for all
>> > "users how have taken courseA and are fluent in english".
>> > Can I do that?

1. Your schema for the single core is quite straightforward,
and along the lines of what you had described (one field for
each database column in each table), e.g.,









Pay attention to the type. Dates should typically be solr.DateField.
The others can be strings, but if they are integers in the database,
you might benefit from making these integers in Solr also.

2. One has to stop thinking of Solr as a RDBMS. Instead, one
flattens out data from a typical RDBMS structure. It is difficult
to give you complete instructions unless you describe the database
relationships, but, e.g., if one has userA with course1, course2,
and course3, and userB with course2, course4, the Solr documents
would be :
  
  
  
  
  
This scheme could also be extended to languages, depending
on how the tables are related.

3. While indexing into Solr, one has to select from the database,
and flatten out the data as above. The two main ways of
doing this are using a library like SolrJ for Java (other languages
have other libraries, e.g., django-haystack is easy to get started
with if one is using Python/Django), or the Solr DataImportHandler
(please see http://wiki.apache.org/solr/DataImportHandler ) with
nested entities.

4. With such a structure, querying Solr should be simple.

Regards,
Gora


SV: configuring schema to match database

2013-01-11 Thread Niklas Langvig
It sounds good not to use more than one core, for sure I do not want to over 
complicate this.

Yes I meant tables.
It's pretty simple.

Both table courses and languages has it's own primary key courseseqno and 
languagesseqno
Both also have a foreign key "userid" that references the users table with 
column userid
The relationship from users to courses and languages are one-to-many.

but I guess I'm thinking wrong because my idead whould be to have a "block" of 
fields connected with one id





These three are connected with a 

But also have a 

To connect to a specific user?

Thanks
/Niklas



-Ursprungligt meddelande-
Från: Gora Mohanty [mailto:g...@mimirtech.com] 
Skickat: den 11 januari 2013 15:55
Till: solr-user@lucene.apache.org
Ämne: Re: configuring schema to match database

On 11 January 2013 19:57, Niklas Langvig  wrote:
> Ahh sorry,
> Now I understand,
> Ok seems like a good solution, I just know need to understand how to 
> query multiple cores now :)

There is no need to use multiple cores in your setup. Going back to your 
original problem statement, it can easily be handled with a single core, and it 
actually makes more sense to do it that way. You will need to give us more 
details.

>> > My question should be really easy, it has most likely been asked 
>> > many times but still I'm not able to google any answer to it.
>> >
>> > To make it easy, I have 3 columns: users, courses and languages

Presumably, you mean three tables, as you describe each as having columns. How 
are the tables connected? Is there a foreign key relationship between them? Is 
the relationship one-to-one, one-to-many, or what?

>> > Users has columns , userid, firstname, lastname Courses has column 
>> > coursename, startdate, enddate Languages has column language, 
>> > writingskill, verbalskill
[...]
>> > I would like to put this data into solr so I can search for all 
>> > "users how have taken courseA and are fluent in english".
>> > Can I do that?

1. Your schema for the single core is quite straightforward,
and along the lines of what you had described (one field for
each database column in each table), e.g.,









Pay attention to the type. Dates should typically be solr.DateField.
The others can be strings, but if they are integers in the database,
you might benefit from making these integers in Solr also.

2. One has to stop thinking of Solr as a RDBMS. Instead, one
flattens out data from a typical RDBMS structure. It is difficult
to give you complete instructions unless you describe the database
relationships, but, e.g., if one has userA with course1, course2,
and course3, and userB with course2, course4, the Solr documents
would be :
  
  
  
  
  
This scheme could also be extended to languages, depending
on how the tables are related.

3. While indexing into Solr, one has to select from the database,
and flatten out the data as above. The two main ways of
doing this are using a library like SolrJ for Java (other languages
have other libraries, e.g., django-haystack is easy to get started
with if one is using Python/Django), or the Solr DataImportHandler
(please see http://wiki.apache.org/solr/DataImportHandler ) with
nested entities.

4. With such a structure, querying Solr should be simple.

Regards,
Gora


Re: Getting Files into Zookeeper

2013-01-11 Thread Mark Miller
It's a bug that you only see RuntimeException - in 4.1 you will get the real 
problem - which is likely around connecting to zookeeper. You might try with a 
single zk host in the zk host string initially. That might make it easier to 
track down why it won't connect. It's tough to diagnose because the root 
exception is being swallowed - it's likely a connect to zk failed exception 
though.

- Mark

On Jan 10, 2013, at 1:34 PM, Christopher Gross  wrote:

> I'm trying to get SolrCloud working with more than one configuration going.
>  I have the base schema that Solr 4 comes with, I'd like to push that and
> one from another project (it does have the _version_ field in it.)  I'm
> having difficulty figuring out how to push things into zookeeper, or if I'm
> even doing this right.
> 
> From the SolrCloud page, I'm trying this and I get an error --
> 
> $ java -classpath
> zookeeper-3.3.6.jar:apache-solr-core-4.0.0.jar:apache-solr-solrj-4.0.0.jar:commons-cli-1.2.jar:slf4j-jdk14-1.6.4.jar:slf4j-api-1.6.4.jar:commons-codec-1.7.jar:commons-fileupload-1.2.1.jar:commons-io-2.1.jar:commons-lang-2.6.jar:guava-r05.jar:httpclient-4.1.3.jar:httpcore-4.1.4.jar:httpmime-4.1.3.jar:jcl-over-slf4j-1.6.4.jar:lucene-analyzers-common-4.0.0.jar:lucene-analyzers-kuromoji-4.0.0.jar:lucene-analyzers-phonetic-4.0.0.jar:lucene-core-4.0.0.jar:lucene-grouping-4.0.0.jar:lucene-highlighter-4.0.0.jar:lucene-memory-4.0.0.jar:lucene-misc-4.0.0.jar:lucene-queries-4.0.0.jar:lucene-queryparser-4.0.0.jar:lucene-spatial-4.0.0.jar:lucene-suggest-4.0.0.jar:spatial4j-0.3.jar:wstx-asl-3.2.7.jar
> org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost
> localhost:2181,localhost:2182,localhost:2183,localhost:2184,localhost:2
> 185 -confdir /solr/data/test/conf -confname myconf
> Exception in thread "main" java.lang.RuntimeException
>at
> org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:115)
>at
> org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:83)
>at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:158)
> 
> Can anyone point me in the direction of some documentation or let me know
> if there's something that I'm missing?
> 
> Thanks!
> 
> -- Chris



Re: Setting up new SolrCloud - need some guidance

2013-01-11 Thread Mark Miller

On Jan 10, 2013, at 12:06 PM, Shawn Heisey  wrote:

> On 1/9/2013 8:54 PM, Mark Miller wrote:
>> I'd put everything into one. You can upload different named sets of config 
>> files and point collections either to the same sets or different sets.
>> 
>> You can really think about it the same way you would setting up a single 
>> node with multiple cores. The main difference is that it's easier to share 
>> sets of config files across collections if you want to. You don't need to at 
>> all though.
>> 
>> I'm not sure if xinclude works with zk, but I don't think it does.
> 
> Thank you for your assistance.  I'll work on recombining my solrconfig.xml.  
> Are there any available full examples of how to set up and start both 
> zookeeper and Solr?  I'll be using the included Jetty 8.

I'm not sure - there are a few blog posts out there. The wiki does a decent job 
for Solr but doesn't get in ZooKeeper - the ZooKeeper site has a pretty simple 
setup guide though.

> 
> Specific questions that have come to mind:
> 
> If I'm planning multiple collections with their own configs, do I still need 
> to bootstrap zookeeper when I start Solr, or should I start it up with the 
> zkHost parameter and then use the collection admin to upload information?  I 
> have not looked closely at the collection admin yet, I just know that it 
> exists.

Currently, there are two main options. Either use the bootstrap param on first 
startup or use the zkcli cmd line tool to upload config sets and link them to 
collections.

> 
> I have heard that if a replica node is down long enough that transaction logs 
> are not enough to fully fix that node, SolrCloud will initiate a full 
> replication.  Is that the case?  If so, is it necessary to configure the 
> replication handler with a specific path for the name, or does SolrCloud 
> handle that itself?

The replication handler should be defined as you see it in the default example 
solrconfig.xml file. Very bare bones.

> 
> Is there an option on updateLog that controls how many transactions are kept, 
> or is that managed automatically by SolrCloud?  I have read some things that 
> talk about 100 updates.  I expect updates on this to be extremely frequent 
> and small, so 100 updates isn't much, and I may want to increase that.

No option - 100 is it as it has implications on the recovery strategy if it's 
raised. I'd like to see it configurable in the future, but would require make 
some other knobs change as well if I remember right.

> 
> Is it expected with future versions of Solr that I could upgrade one of my 
> nodes to 4.2 or 4.3 and have it work with the other node still at 4.1?  I 
> would also hope that would mean that the last 4.x release would work with 
> 5.0.  That would make it possible to do rolling upgrades with no downtime.

I don't think we have committed to anything here yet. Seems like something we 
need to hash out, but we have not wanted to be too limited initially. For 
example, the Solr 4.0 to 4.1 upgrade with SolrCloud still needs some 
explanation and might require some down time.

- Mark

> 
> Thanks,
> Shawn
> 



RE: Setting up new SolrCloud - need some guidance

2013-01-11 Thread Markus Jelsma
FYI: XInclude works fine. We have all request handlers in solrconfig in 
separate files and include them via XInclude on a running SolrCloud cluster. 
 
-Original message-
> From:Mark Miller 
> Sent: Fri 11-Jan-2013 17:13
> To: solr-user@lucene.apache.org
> Subject: Re: Setting up new SolrCloud - need some guidance
> 
> 
> On Jan 10, 2013, at 12:06 PM, Shawn Heisey  wrote:
> 
> > On 1/9/2013 8:54 PM, Mark Miller wrote:
> >> I'd put everything into one. You can upload different named sets of config 
> >> files and point collections either to the same sets or different sets.
> >> 
> >> You can really think about it the same way you would setting up a single 
> >> node with multiple cores. The main difference is that it's easier to share 
> >> sets of config files across collections if you want to. You don't need to 
> >> at all though.
> >> 
> >> I'm not sure if xinclude works with zk, but I don't think it does.
> > 
> > Thank you for your assistance.  I'll work on recombining my solrconfig.xml. 
> >  Are there any available full examples of how to set up and start both 
> > zookeeper and Solr?  I'll be using the included Jetty 8.
> 
> I'm not sure - there are a few blog posts out there. The wiki does a decent 
> job for Solr but doesn't get in ZooKeeper - the ZooKeeper site has a pretty 
> simple setup guide though.
> 
> > 
> > Specific questions that have come to mind:
> > 
> > If I'm planning multiple collections with their own configs, do I still 
> > need to bootstrap zookeeper when I start Solr, or should I start it up with 
> > the zkHost parameter and then use the collection admin to upload 
> > information?  I have not looked closely at the collection admin yet, I just 
> > know that it exists.
> 
> Currently, there are two main options. Either use the bootstrap param on 
> first startup or use the zkcli cmd line tool to upload config sets and link 
> them to collections.
> 
> > 
> > I have heard that if a replica node is down long enough that transaction 
> > logs are not enough to fully fix that node, SolrCloud will initiate a full 
> > replication.  Is that the case?  If so, is it necessary to configure the 
> > replication handler with a specific path for the name, or does SolrCloud 
> > handle that itself?
> 
> The replication handler should be defined as you see it in the default 
> example solrconfig.xml file. Very bare bones.
> 
> > 
> > Is there an option on updateLog that controls how many transactions are 
> > kept, or is that managed automatically by SolrCloud?  I have read some 
> > things that talk about 100 updates.  I expect updates on this to be 
> > extremely frequent and small, so 100 updates isn't much, and I may want to 
> > increase that.
> 
> No option - 100 is it as it has implications on the recovery strategy if it's 
> raised. I'd like to see it configurable in the future, but would require make 
> some other knobs change as well if I remember right.
> 
> > 
> > Is it expected with future versions of Solr that I could upgrade one of my 
> > nodes to 4.2 or 4.3 and have it work with the other node still at 4.1?  I 
> > would also hope that would mean that the last 4.x release would work with 
> > 5.0.  That would make it possible to do rolling upgrades with no downtime.
> 
> I don't think we have committed to anything here yet. Seems like something we 
> need to hash out, but we have not wanted to be too limited initially. For 
> example, the Solr 4.0 to 4.1 upgrade with SolrCloud still needs some 
> explanation and might require some down time.
> 
> - Mark
> 
> > 
> > Thanks,
> > Shawn
> > 
> 
> 


Re: Getting Files into Zookeeper

2013-01-11 Thread Christopher Gross
I changed it to only go to one Zookeeper (localhost:2181) and it still gave
me the same stack trace error.

I was eventually able to get around this -- I just used the "bootstrap"
arguments when starting up my Tomcat instances to push the configs over --
though I'd rather just do it externally from Tomcat in the future.

Thanks Mark.

-- Chris


On Fri, Jan 11, 2013 at 11:00 AM, Mark Miller  wrote:

> It's a bug that you only see RuntimeException - in 4.1 you will get the
> real problem - which is likely around connecting to zookeeper. You might
> try with a single zk host in the zk host string initially. That might make
> it easier to track down why it won't connect. It's tough to diagnose
> because the root exception is being swallowed - it's likely a connect to zk
> failed exception though.
>
> - Mark
>
> On Jan 10, 2013, at 1:34 PM, Christopher Gross  wrote:
>
> > I'm trying to get SolrCloud working with more than one configuration
> going.
> >  I have the base schema that Solr 4 comes with, I'd like to push that and
> > one from another project (it does have the _version_ field in it.)  I'm
> > having difficulty figuring out how to push things into zookeeper, or if
> I'm
> > even doing this right.
> >
> > From the SolrCloud page, I'm trying this and I get an error --
> >
> > $ java -classpath
> >
> zookeeper-3.3.6.jar:apache-solr-core-4.0.0.jar:apache-solr-solrj-4.0.0.jar:commons-cli-1.2.jar:slf4j-jdk14-1.6.4.jar:slf4j-api-1.6.4.jar:commons-codec-1.7.jar:commons-fileupload-1.2.1.jar:commons-io-2.1.jar:commons-lang-2.6.jar:guava-r05.jar:httpclient-4.1.3.jar:httpcore-4.1.4.jar:httpmime-4.1.3.jar:jcl-over-slf4j-1.6.4.jar:lucene-analyzers-common-4.0.0.jar:lucene-analyzers-kuromoji-4.0.0.jar:lucene-analyzers-phonetic-4.0.0.jar:lucene-core-4.0.0.jar:lucene-grouping-4.0.0.jar:lucene-highlighter-4.0.0.jar:lucene-memory-4.0.0.jar:lucene-misc-4.0.0.jar:lucene-queries-4.0.0.jar:lucene-queryparser-4.0.0.jar:lucene-spatial-4.0.0.jar:lucene-suggest-4.0.0.jar:spatial4j-0.3.jar:wstx-asl-3.2.7.jar
> > org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost
> > localhost:2181,localhost:2182,localhost:2183,localhost:2184,localhost:2
> > 185 -confdir /solr/data/test/conf -confname myconf
> > Exception in thread "main" java.lang.RuntimeException
> >at
> > org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:115)
> >at
> > org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:83)
> >at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:158)
> >
> > Can anyone point me in the direction of some documentation or let me know
> > if there's something that I'm missing?
> >
> > Thanks!
> >
> > -- Chris
>
>


Re: configuring schema to match database

2013-01-11 Thread Gora Mohanty
On 11 January 2013 21:13, Niklas Langvig  wrote:
> It sounds good not to use more than one core, for sure I do not want to over 
> complicate this.
[...]

Yes, not only are multiple cores unnecessarily complicated here,
your searches will also be be less complex, and faster.

> Both table courses and languages has it's own primary key courseseqno and 
> languagesseqno

There is no need to index these.

> Both also have a foreign key "userid" that references the users table with 
> column userid
> The relationship from users to courses and languages are one-to-many.

> but I guess I'm thinking wrong because my idead whould be to have a "block" 
> of fields connected with one id
>
> 
> 
> 
>
> These three are connected with a
> 
> But also have a
> 
> To connect to a specific user?
[...]

You are still thinking of Solr as a RDBMS, where you should not
be. In your case, it is easiest to flatten out the data. This increases
the size of the index, but that should not really be of concern. As
your courses and languages tables are connected only to user, the
schema that I described earlier should suffice. To extend my
earlier example, given:
* userA with courses c1, c2, c3, and languages l1, l2
* userB with c2, c3, and l2
you should flatten it such that you get the following Solr documents
  ... ...
  ... ...
  ... ...
...
  ... ...
  ... ...
i.e., a total of 3 courses x 2 languages = 6 documents for
userA, and 2 courses x 1 language = 2 documents for userB

In order to get this form of flattened data into Solr, I would
suggest using the DataImportHandler with nested entities.
Please see the earlier link to DIH. Also, a Google search
for Solr dataimporthandler nested entities turns up many
examples, including:
http://solr.pl/en/2010/10/11/data-import-handler-%E2%80%93-how-to-import-data-from-sql-databases-part-1/
Please give it a try, and post here with your attempts if
you run into any issues.

Regards,
Gora


How to disable\clear filterCache(from SolrIndexSearcher ) in a custom searchComponent

2013-01-11 Thread radu

Hello & thank you in advance for your help!,

*Context:*
I have implemented a custom search component that receives 3 parameters 
field, termValue and payloadX.
The component should search for a termValue in the requested Lucene 
field and for each *termValue* to check *payloadX* in its associated 
payload the information.


*Constraints:*
I don't want to disable filterCache from solconfig.xml the class="solr.FastLRUCache" > since I have other searchComponents that 
could use the filterCache.


I have implemented this the payload search using SpanTermQuery and 
"attached it" to q:field=termValue

public class MySearchComponent extends XPatternsSearchComponent {

public void prepare(ResponseBuilder rb){
...   rb.setQueryString(parameters.get(CommonParams.Q)...
}

 public void process(ResponseBuilder rb) {
...

SolrIndexSearcher.QueryResult queryResult = new 
SolrIndexSearcher.QueryResult();// ??? question for help


*CustomSpanTermQuery* customFilterQuery = new CustomSpanTermQuery(field, 
term, payload); //search for payloadCriteria in the payload in a 
specific field for a specific term
QueryCommand queryCommand = 
rb.getQueryCommand().setFilterList(filterQuery));


rb.req.getSearcher().search(queryResult, queryCommand);

...
}

*Issue:*
If I call the search component with field1, termValue1 and:
 - *payload1*(the first search) the result from filtering it is 
saved in filterCache.
 - *payload2*(second time) the results from the first 
search(filterCache) are returned and not a different expected result set.


Findings:
I noticed that in SolrIndexSearch, filterCache is private so I can 
not change\clear it through inheritance.
Also I tried to use rb.getQueryCommand().replaceFlags() but 
SolrIndexSearch.NO_CHECK_FILTERCACHE|NO_CHECK_QCACHE|NO_SET_QCACHE are 
not public too.


*Question*:
How to disable\clear filterCache(from SolrIndexSearcher ) *only 
*for a custom search component.

Do  I have other options\approaches?

Best regards,
Radu


RE: Forwarding authentication credentials in internal node-to-node requests

2013-01-11 Thread Markus Jelsma
Hmm, you need to set up the HttpClient in HttpShardHandlerFactory but you 
cannot access the HttpServletRequest from there, it is only available in 
SolrDispatchFilter AFAIK. And then, the HttpServletRequest can only return the 
remote user name, not the password he, she or it provided. I don't know how to 
obtain the password.
 
-Original message-
> From:Per Steffensen 
> Sent: Fri 11-Jan-2013 15:28
> To: solr-user@lucene.apache.org
> Subject: Re: Forwarding authentication credentials in internal node-to-node 
> requests
> 
> Hmmm, it will not work for me. I want the "original" credential 
> forwarded in the sub-requests. The credentials are mapped to permissions 
> (authorization), and basically I dont want a user to be able have 
> something done in the (automatically performed by the contacted 
> solr-node) sub-requests that he is not authorized to do. Forward of 
> credentials is a must. So what you are saying is that I should expect to 
> have to do some modifications to Solr in order to achieve what I want?
> 
> Regards, Per Steffensen
> 
> On 1/11/13 2:11 PM, Markus Jelsma wrote:
> > Hi,
> >
> > If your credentials are fixed i would configure username:password in your 
> > request handler's shardHandlerFactory configuration section and then modify 
> > HttpShardHandlerFactory.init() to create a HttpClient with an AuthScope 
> > configured with those settings.
> >
> > I don't think you can obtain the original credentials very easy when inside 
> > HttpShardHandlerFactory.
> >
> > Cheers
> >   
> > -Original message-
> >> From:Per Steffensen 
> >> Sent: Fri 11-Jan-2013 13:07
> >> To: solr-user@lucene.apache.org
> >> Subject: Forwarding authentication credentials in internal node-to-node 
> >> requests
> >>
> >> Hi
> >>
> >> I read http://wiki.apache.org/solr/SolrSecurity and know a lot about
> >> webcontainer authentication and authorization. Im sure I will be able to
> >> set it up so that each solr-node is will require HTTP authentication for
> >> (selected) incoming requests.
> >>
> >> But solr-nodes also make requests among each other and Im in doubt if
> >> credentials are forwarded from the "original request" to the internal
> >> sub-requests?
> >> E.g. lets say that each solr-node is set up to require authentication
> >> for search request. An "outside" user makes a distributed request
> >> including correct username/password. Since it is a distributed search,
> >> the node which handles the original request from the user will have to
> >> make sub-requests to other solr-nodes but they also require correct
> >> credentials in order to accept this sub-request. Are the credentials
> >> from the original request duplicated to the sub-requests or what options
> >> do I have?
> >> Same thing goes for e.g. update requests if they are sent to a node
> >> which does not run (all) the replica of the shard in which the documents
> >> to be added/updated/deleted belong. The node needs to make sub-request
> >> to other nodes, and it will require forwarding the credentials.
> >>
> >> Does this just work out of the box, or ... ?
> >>
> >> Regards, Per Steffensen
> >>
> 
> 


Re: link on graph page

2013-01-11 Thread Mark Miller
They point to the admin UI - or should - that seems right?

- Mark

On Jan 11, 2013, at 10:57 AM, Christopher Gross  wrote:

> I've managed to get my SolrCloud set up to have 2 different indexes up and
> running.  However, my URLs aren't right.  They just point to
> http://server:port/solr, not http://server:port/solr/index1 or
> http://server:port/solr/index2.
> 
> Is that something that I can set in my solr.xml for that Solr instance, or
> is it something that I'd have to set in each one's solrconfig.xml.
> 
> Any help would be appreciated.  Thanks!
> 
> -- Chris



Re: configuring schema to match database

2013-01-11 Thread Jens Grivolla

On 01/11/2013 05:23 PM, Gora Mohanty wrote:

You are still thinking of Solr as a RDBMS, where you should not
be. In your case, it is easiest to flatten out the data. This increases
the size of the index, but that should not really be of concern. As
your courses and languages tables are connected only to user, the
schema that I described earlier should suffice. To extend my
earlier example, given:
* userA with courses c1, c2, c3, and languages l1, l2
* userB with c2, c3, and l2
you should flatten it such that you get the following Solr documents
  ... ...
  ... ...
  ... ...

  ... ...
  ... ...
i.e., a total of 3 courses x 2 languages = 6 documents for
userA, and 2 courses x 1 language = 2 documents for userB


Actually, that is what you would get when doing a join in an RDBMS, the 
cross-product of your tables. This is NOT AT ALL what you typically do 
in Solr.


Best start the other way around, think of Solr as a retrieval system, 
not a storage system. What are your queries? What do you want to find, 
and what criteria do you use to search for it?


If your intention is to find users that match certain criteria, each 
entry should be a user (with ALL associated information, e.g. all 
courses, all language skills, etc.), if you want to retrieve courses, 
each entry should be a course.


Let's say you want to find users who have certain language skills, you 
would have a schema that describes a user:

- user id
- user name
- languages
- ...

In languages, you could store e.g. things like: en|reading|high 
es|writing|low, etc. It could be a multivalued field or just have 
everything separated by space and a tokenizer that splits on whitespace.


Now you can query:

- language:es* -- return all users with some spanish skills
- language:en|writing|high -- return all users with high english writing 
skills
- +(language:es* language:fr*) +language:en|writing|high -- return users 
with high english writing skills and some knowledge of french or spanish


If you want to avoid wildcard queries (more costly) you can just add 
plain "en" and "es", etc. to your field so "language:es" will match 
anybody with spanish skills.


Best,
Jens



Re: configuring schema to match database

2013-01-11 Thread Gora Mohanty
On 11 January 2013 22:30, Jens Grivolla  wrote:
[...]
> Actually, that is what you would get when doing a join in an RDBMS, the 
> cross-product of your tables. This is NOT AT ALL what you typically do in 
> Solr.
>
> Best start the other way around, think of Solr as a retrieval system, not a 
> storage system. What are your queries? What do you want to find, and what 
> criteria do you use to search for it?
[...]

Um, he did describe his desired queries, and there was a reason
that I proposed the above schema design.

> > UserA has taken courseA, courseB and courseC and has writingskill
> > good verbalskill good for english and writingskill excellent
> > verbalskill excellent for spanish UserB has taken courseA, courseF,
> > courseG and courseH and has writingskill fluent verbalskill fluent
> > for english and writingskill good verbalskill good for italian

Unless the index is becoming huge, I feel that it is better to
flatten everything out rather than combine fields, and
post-process the results.

Regards,
Gora


Re: Solr 4.0, slow opening searchers

2013-01-11 Thread Alan Woodward
Hi Marcel,

Are you committing data with hard commits or soft commits?  I've seen systems 
where we've inadvertently only used soft commits, which means that the entire 
transaction log has to be re-read on startup, which can take a long time.  Hard 
commits flush indexed data to disk, and make it a lot quicker to restart.

Alan Woodward
a...@flax.co.uk


On 11 Jan 2013, at 13:51, Marcel Bremer wrote:

> Hi,
> 
> We're experiencing slow startup times of searchers in Solr when containing a 
> large number of documents.
> 
> We use Solr v4.0 with Jetty and currently have 267.657.634 documents stored, 
> spread across 9 cores. These documents contain keywords, with additional 
> statistics, which we are using for suggestions and related keywords. When we 
> (re)start Solr on one of our servers it can take up to two hours before Solr 
> has opened all of it's searchers and starts accepting connections again. We 
> can't figure out why it takes so long to open those searchers. Also the CPU 
> and memory usage of Solr while opening searchers is not extremely high.
> 
> Are there any known issues or tips someone could give us to speed up opening 
> searchers?
> 
> If you need more details, please ping me.
> 
> 
> Best regards,
> 
> Marcel Bremer
> Vinden.nl BV



how to perform a delta-import when related table is updated

2013-01-11 Thread PeterKerk
My delta-import
(http://localhost:8983/solr/freemedia/dataimport?command=delta-import) does
not correctly update my solr fields.


Please see my data-config here:



  


Now when a new item is inserted into [freemedialikes]
and I perform a delta-import, the Solr index does not show the total new
amount of likes. Only after I perform a full-import
(http://localhost:8983/solr/freemedia/dataimport?command=full-import) the
correct number is shown.
So the SQL is returning the correct results, I just don't know how to get
the updated likes count via the delta-import.

I have reloaded the data-config everytime I made a change. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-perform-a-delta-import-when-related-table-is-updated-tp4032587.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Setting up new SolrCloud - need some guidance

2013-01-11 Thread Shawn Heisey

On 1/11/2013 9:15 AM, Markus Jelsma wrote:

FYI: XInclude works fine. We have all request handlers in solrconfig in 
separate files and include them via XInclude on a running SolrCloud cluster.


Good to know.  I'm still deciding whether I want to recombine or 
continue to use xinclude.  Is the xinclude path relative to 
solrconfig.xml just as it is now, so I could link to 
include/indexConfig.xml?  Are things partitioned well enough that one 
collection's config will not overlap into another config when using 
xinclude and relative paths?


The way I do things now, all files in cores/corename/conf (relative to 
solr.home) are symlinks, such as solrconfig.xml -> 
../../../config/X/solrconfig.xml, where X is a general 
designation for a type of config.  I have good separation between 
instanceDir, data, and real config files.  The paths in the xinclude 
elements are relative to the location of the symlink.


Thanks,
Shawn



RE: how to perform a delta-import when related table is updated

2013-01-11 Thread Dyer, James
Peter,

See http://wiki.apache.org/solr/DataImportHandler#Using_delta-import_command , 
then scroll down to where it says "The deltaQuery in the above example only 
detects changes in item but not in other tables..."  It shows you two ways to 
do it.

Option 1:  add a reference to the last_modified_date (or whatever) from the 
child table in a "where-in" clause in the parent entity's "deltaQuery".

Option 2:  add a "parentDeltaQuery" on the child entity.  This is a query that 
tells DIH which parent-table keys need to update because of child table 
updates.  In other words, say your child's Delta Query says that child_id=1 
changed.  You might have for parentDeltaQuery something like: SELECT ID FROM 
PARENT P WHERE P.CHILD_ID=${Child.ID} .  While this can simplify things for you 
and prevent you from not needing giant "where-in" clauses on the parent query, 
it will double the number of queries that get issued to determine which 
documents to update.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: PeterKerk [mailto:vettepa...@hotmail.com] 
Sent: Friday, January 11, 2013 12:02 PM
To: solr-user@lucene.apache.org
Subject: how to perform a delta-import when related table is updated

My delta-import
(http://localhost:8983/solr/freemedia/dataimport?command=delta-import) does
not correctly update my solr fields.


Please see my data-config here:



  


Now when a new item is inserted into [freemedialikes]
and I perform a delta-import, the Solr index does not show the total new
amount of likes. Only after I perform a full-import
(http://localhost:8983/solr/freemedia/dataimport?command=full-import) the
correct number is shown.
So the SQL is returning the correct results, I just don't know how to get
the updated likes count via the delta-import.

I have reloaded the data-config everytime I made a change. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-perform-a-delta-import-when-related-table-is-updated-tp4032587.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: how to perform a delta-import when related table is updated

2013-01-11 Thread PeterKerk
Hi James,

Ok, so I did this:


I now get this error in the logfile:


SEVERE: Delta Import Failed
java.lang.IllegalArgumentException: deltaQuery has no column to resolve to
declared primary key pk='ID'



Now, my table looks like this:  


CREATE TABLE [dbo].[freemedialikes](
[id] [int] IDENTITY(1,1) NOT NULL,
[userid] [nvarchar](50) NOT NULL,
[freemediaid] [int] NOT NULL,
[createdate] [datetime] NOT NULL,
 CONSTRAINT [PK_freemedialikes] PRIMARY KEY CLUSTERED 
(
[id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY =
OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[freemedialikes]  WITH CHECK ADD  CONSTRAINT
[FK_freemedialikes_freemedia] FOREIGN KEY([freemediaid])
REFERENCES [dbo].[freemedia] ([id])
ON DELETE CASCADE
GO

ALTER TABLE [dbo].[freemedialikes] CHECK CONSTRAINT
[FK_freemedialikes_freemedia]
GO

ALTER TABLE [dbo].[freemedialikes] ADD  CONSTRAINT
[DF_freemedialikes_createdate]  DEFAULT (getdate()) FOR [createdate]
GO


So in the deltaquery I thought I had to reference the freemediaid, like so:
"select freemediaid as id from freemedialikes"

Got the same error as above.
So then I thought since there was mention of a PK in the error I just
reference the PK of the childtable, didn't make sense, but hey :)
"select id from freemedialikes w"

But I got the same error again.


Any suggestions?
Thanks! 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-perform-a-delta-import-when-related-table-is-updated-tp4032587p4032608.html
Sent from the Solr - User mailing list archive at Nabble.com.


Accessing raw index data

2013-01-11 Thread Achim Domma
Hi,

I have just setup my first Solr 4.0 instance and have added about one million 
documents. I would like to access the raw data stored in the index. Can 
somebody give me a starting point how to do that?

As a first step, a simple dump would be absolutely ok. I just want to play 
around and do some static offline analysis. In the long term, I probably would 
like to implement custom search components to enrich my search results. So if 
there's no export for raw data, I would be happy to learn how to implement 
custom handlers and/or search components. Some guidance where to start would be 
very appreciated.

kind regards,
Achim

Re: Accessing raw index data

2013-01-11 Thread Gora Mohanty
On 12 January 2013 01:06, Achim Domma  wrote:
>
> Hi,
>
> I have just setup my first Solr 4.0 instance and have added about one
> million documents. I would like to access the raw data stored in the index.
> Can somebody give me a starting point how to do that?
>
> As a first step, a simple dump would be absolutely ok. I just want to play
> around and do some static offline analysis. In the long term, I probably
> would like to implement custom search components to enrich my search
> results. So if there's no export for raw data, I would be happy to learn how
> to implement custom handlers and/or search components. Some guidance where
> to start would be very appreciated.

It is not clear what you mean by "raw data", and what level of
customisation you are after. Here are two possibilities:
* At the base, Solr indexes are Lucene indexes, so one can always
  drop down to that level.
* Also, Solr allows plugins for various components. This link might
  be of help, depending on the extent of customisation you are after:
  http://wiki.apache.org/solr/SolrPlugins

Maybe you should approach this from the other end: If you could
describe what you are trying to achieve, people might be able to
offer possibilities.

Regards,
Gora


Re: Accessing raw index data

2013-01-11 Thread Achim Domma
"At the base, Solr indexes are Lucene indexes, so one can always
 drop down to that level."

That's what I'm looking for. I understand, that at the end, there has to be an 
inverse index (or rather multiple of them), holding all "words" which occurre 
in my documents, each "word" having a list of documents the "word" was part of. 
I would like to do some statistics based on this information, would like to 
analyze how it changes if I change my text processing settings, ...

If you would give me a starting point like "Data is stored in Lucene indexes, 
which are documented at XXX. In a request handler you can access the indexes 
via YYY.", I would be perfectly happy figuring out the rest on my own. 
Documentation about 4.0 is a bit limited, so it's hard to find an entry point.

cheers,
Achim

Am 11.01.2013 um 20:54 schrieb Gora Mohanty:

> On 12 January 2013 01:06, Achim Domma  wrote:
>> 
>> Hi,
>> 
>> I have just setup my first Solr 4.0 instance and have added about one
>> million documents. I would like to access the raw data stored in the index.
>> Can somebody give me a starting point how to do that?
>> 
>> As a first step, a simple dump would be absolutely ok. I just want to play
>> around and do some static offline analysis. In the long term, I probably
>> would like to implement custom search components to enrich my search
>> results. So if there's no export for raw data, I would be happy to learn how
>> to implement custom handlers and/or search components. Some guidance where
>> to start would be very appreciated.
> 
> It is not clear what you mean by "raw data", and what level of
> customisation you are after. Here are two possibilities:
> * At the base, Solr indexes are Lucene indexes, so one can always
>  drop down to that level.
> * Also, Solr allows plugins for various components. This link might
>  be of help, depending on the extent of customisation you are after:
>  http://wiki.apache.org/solr/SolrPlugins
> 
> Maybe you should approach this from the other end: If you could
> describe what you are trying to achieve, people might be able to
> offer possibilities.
> 
> Regards,
> Gora



Re: Accessing raw index data

2013-01-11 Thread Gora Mohanty
On 12 January 2013 02:03, Achim Domma  wrote:
> "At the base, Solr indexes are Lucene indexes, so one can always
>  drop down to that level."
>
> That's what I'm looking for. I understand, that at the end, there has to be 
> an inverse index (or rather multiple of them), holding all "words" which 
> occurre in my documents, each "word" having a list of documents the "word" 
> was part of. I would like to do some statistics based on this information, 
> would like to analyze how it changes if I change my text processing settings, 
> ...
>
> If you would give me a starting point like "Data is stored in Lucene indexes, 
> which are documented at XXX. In a request handler you can access the indexes 
> via YYY.", I would be perfectly happy figuring out the rest on my own. 
> Documentation about 4.0 is a bit limited, so it's hard to find an entry point.

Sadly, you have hit the limits of my knowledge: We
have not yet had the need to delve into details of
Lucene indexes, but I am sure that others can fill in.

Regards,
Gora


Re: Accessing raw index data

2013-01-11 Thread Alexandre Rafalovitch
Have you looked at Solr admin interface in details? Specifically, analysis
section under each core. It provides some of the statistics you seem to
want. And, gives you the source code to look at to understand how to create
your own version of that. Specifically, the "Luke" package is what you
might be looking for.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Fri, Jan 11, 2013 at 3:33 PM, Achim Domma  wrote:

> "At the base, Solr indexes are Lucene indexes, so one can always
>  drop down to that level."
>
> That's what I'm looking for. I understand, that at the end, there has to
> be an inverse index (or rather multiple of them), holding all "words" which
> occurre in my documents, each "word" having a list of documents the "word"
> was part of. I would like to do some statistics based on this information,
> would like to analyze how it changes if I change my text processing
> settings, ...
>
> If you would give me a starting point like "Data is stored in Lucene
> indexes, which are documented at XXX. In a request handler you can access
> the indexes via YYY.", I would be perfectly happy figuring out the rest on
> my own. Documentation about 4.0 is a bit limited, so it's hard to find an
> entry point.
>
> cheers,
> Achim
>
> Am 11.01.2013 um 20:54 schrieb Gora Mohanty:
>
> > On 12 January 2013 01:06, Achim Domma  wrote:
> >>
> >> Hi,
> >>
> >> I have just setup my first Solr 4.0 instance and have added about one
> >> million documents. I would like to access the raw data stored in the
> index.
> >> Can somebody give me a starting point how to do that?
> >>
> >> As a first step, a simple dump would be absolutely ok. I just want to
> play
> >> around and do some static offline analysis. In the long term, I probably
> >> would like to implement custom search components to enrich my search
> >> results. So if there's no export for raw data, I would be happy to
> learn how
> >> to implement custom handlers and/or search components. Some guidance
> where
> >> to start would be very appreciated.
> >
> > It is not clear what you mean by "raw data", and what level of
> > customisation you are after. Here are two possibilities:
> > * At the base, Solr indexes are Lucene indexes, so one can always
> >  drop down to that level.
> > * Also, Solr allows plugins for various components. This link might
> >  be of help, depending on the extent of customisation you are after:
> >  http://wiki.apache.org/solr/SolrPlugins
> >
> > Maybe you should approach this from the other end: If you could
> > describe what you are trying to achieve, people might be able to
> > offer possibilities.
> >
> > Regards,
> > Gora
>
>


RE: how to perform a delta-import when related table is updated

2013-01-11 Thread Dyer, James
Try adding the "pk" attribute to the parent entity in any of these 4 ways:

mailto:vettepa...@hotmail.com] 
Sent: Friday, January 11, 2013 1:18 PM
To: solr-user@lucene.apache.org
Subject: RE: how to perform a delta-import when related table is updated

Hi James,

Ok, so I did this:


I now get this error in the logfile:


SEVERE: Delta Import Failed
java.lang.IllegalArgumentException: deltaQuery has no column to resolve to
declared primary key pk='ID'



Now, my table looks like this:  


CREATE TABLE [dbo].[freemedialikes](
[id] [int] IDENTITY(1,1) NOT NULL,
[userid] [nvarchar](50) NOT NULL,
[freemediaid] [int] NOT NULL,
[createdate] [datetime] NOT NULL,
 CONSTRAINT [PK_freemedialikes] PRIMARY KEY CLUSTERED 
(
[id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY =
OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[freemedialikes]  WITH CHECK ADD  CONSTRAINT
[FK_freemedialikes_freemedia] FOREIGN KEY([freemediaid])
REFERENCES [dbo].[freemedia] ([id])
ON DELETE CASCADE
GO

ALTER TABLE [dbo].[freemedialikes] CHECK CONSTRAINT
[FK_freemedialikes_freemedia]
GO

ALTER TABLE [dbo].[freemedialikes] ADD  CONSTRAINT
[DF_freemedialikes_createdate]  DEFAULT (getdate()) FOR [createdate]
GO


So in the deltaquery I thought I had to reference the freemediaid, like so:
"select freemediaid as id from freemedialikes"

Got the same error as above.
So then I thought since there was mention of a PK in the error I just
reference the PK of the childtable, didn't make sense, but hey :)
"select id from freemedialikes w"

But I got the same error again.


Any suggestions?
Thanks! 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-perform-a-delta-import-when-related-table-is-updated-tp4032587p4032608.html
Sent from the Solr - User mailing list archive at Nabble.com.




SolrJ |ContentStreamUpdateRequest | Accessing parsed items without committing to solr

2013-01-11 Thread uwe72
i have a bit strange usecase.

when i index a pdf to solr i use ContentStreamUpdateRequest.
The lucene document then contains in the "text" field all containing items
(the parsed items of the physical pdf).

i also need to add these parsed items to another lucene document.

is there a way, to receive/parse these items just in memory, without
comitting them to lucene?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-ContentStreamUpdateRequest-Accessing-parsed-items-without-committing-to-solr-tp4032636.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SloppyPhraseScorer behavior change

2013-01-11 Thread varun srivastava
Hi Jack,
 Is this a new change done in solr 4.0 ? Seems autoGeneratePhraseQueries
option is present from solr 3.1. Just wanted to confirm this is the
difference causing change in behavior between 3.4 and 4.0.


Thanks
Varun

On Mon, Dec 24, 2012 at 3:00 PM, Jack Krupansky wrote:

> Thanks. Sloppy phrase requires that the query terms be in a phrase, but
> you don't have any quotes in your query.
>
> Depending on your schema field type you may be running into a change in
> how auto-generated phrase queries are handled. It used to be that
> apple0ipad would always be treated as the quoted phrase "apple 0 ipad", but
> now that is only true if your field type has autoGeneratePhraseQueries=true
> set. Now, if you don't have that option set, the term gets treated as
> (apple OR 0 OR ipad), which is a lot looser than the exact phrase.
>
> Look at the new example schema for the "text_en_splitting" field type as
> an example.
>
>
> -- Jack Krupansky
>
> -Original Message- From: varun srivastava
> Sent: Monday, December 24, 2012 5:49 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SloppyPhraseScorer behavior change
>
>
> Hi Jack,
> My query was simple /solr/select?query=ipad apple apple0ipad
> and doc contained "apple ipad" .
>
> If you see the patch attached with the bug 3215 , you will find following
> comment. I want to confirm whether the behaviour I am observing is in sync
> with what the patch developer intended or its just some regression bug. In
> solr 3.4 phrase order is honored, whereas in solr 4.0 phrase order is not
> honored, i.e. "apple ipad" and "ipad apple" both treated as same.
>
>
>
> ""
>
> /**
> +   * Score a candidate doc for all slop-valid position-combinations
> (matches)
> +   * encountered while traversing/hopping the PhrasePositions.
> +   *  The score contribution of a match depends on the distance:
> +   *  - highest score for distance=0 (exact match).
> +   *  - score gets lower as distance gets higher.
> +   * Example: for query "a b"~2, a document "x a b a y" can be
> scored twice:
> +   * once for "a b" (distance=0), and once for "b a" (distance=2).
> +   * Possibly not all valid combinations are encountered, because
> for efficiency
> +   * we always propagate the least PhrasePosition. This allows to base on
> +   * PriorityQueue and move forward faster.
> +   * As result, for example, document "a b c b a"
> +   * would score differently for queries "a b c"~4 and "c b a"~4, although
> +   * they really are equivalent.
> +   * Similarly, for doc "a b c b a f g", query "c b"~2
> +   * would get same score as "g f"~2, although "c b"~2 could be matched
> twice.
> +   * We may want to fix this in the future (currently not, for
> performance reasons).
> +   */
>
> ""
>
>
>
> On Mon, Dec 24, 2012 at 1:21 PM, Jack Krupansky *
> *wrote:
>
>  Could you post the full query URL, so we can see exactly what your query
>> was? Or, post the output of &debug=query, which will show us what Lucene
>> query was generated.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: varun srivastava
>> Sent: Monday, December 24, 2012 1:53 PM
>> To: solr-user@lucene.apache.org
>> Subject: SloppyPhraseScorer behavior change
>>
>>
>> Hi,
>>  Due to following bug fix
>> https://issues.apache.org/jira/browse/LUCENE-3215
>> https://issues.apache.org/jira/browse/LUCENE-3215>>observing
>> a change
>>
>> in behavior of SloppyPhraseScorer. I just wanted to
>> confirm my understanding with you all.
>>
>> After solr 3.5 ( bug is fixed in 3.5), if there is a document "a b c d e",
>> then in solr 3.4 only query "a b" will match with document, but in solr
>> 3.5
>> onwards, both  query "a b" and "b a" will match. Is it right ?
>>
>>
>> Thanks
>> Varun
>>
>>
>


Re: SloppyPhraseScorer behavior change

2013-01-11 Thread varun srivastava
Moreover just checked .. autoGeneratePhraseQueries="true" is set for both
3.4 and 4.0 in my schema.

Thanks
Varun

On Fri, Jan 11, 2013 at 1:04 PM, varun srivastava wrote:

> Hi Jack,
>  Is this a new change done in solr 4.0 ? Seems autoGeneratePhraseQueries
> option is present from solr 3.1. Just wanted to confirm this is the
> difference causing change in behavior between 3.4 and 4.0.
>
>
> Thanks
> Varun
>
>
> On Mon, Dec 24, 2012 at 3:00 PM, Jack Krupansky 
> wrote:
>
>> Thanks. Sloppy phrase requires that the query terms be in a phrase, but
>> you don't have any quotes in your query.
>>
>> Depending on your schema field type you may be running into a change in
>> how auto-generated phrase queries are handled. It used to be that
>> apple0ipad would always be treated as the quoted phrase "apple 0 ipad", but
>> now that is only true if your field type has autoGeneratePhraseQueries=true
>> set. Now, if you don't have that option set, the term gets treated as
>> (apple OR 0 OR ipad), which is a lot looser than the exact phrase.
>>
>> Look at the new example schema for the "text_en_splitting" field type as
>> an example.
>>
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: varun srivastava
>> Sent: Monday, December 24, 2012 5:49 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: SloppyPhraseScorer behavior change
>>
>>
>> Hi Jack,
>> My query was simple /solr/select?query=ipad apple apple0ipad
>> and doc contained "apple ipad" .
>>
>> If you see the patch attached with the bug 3215 , you will find following
>> comment. I want to confirm whether the behaviour I am observing is in sync
>> with what the patch developer intended or its just some regression bug. In
>> solr 3.4 phrase order is honored, whereas in solr 4.0 phrase order is not
>> honored, i.e. "apple ipad" and "ipad apple" both treated as same.
>>
>>
>>
>> ""
>>
>> /**
>> +   * Score a candidate doc for all slop-valid position-combinations
>> (matches)
>> +   * encountered while traversing/hopping the PhrasePositions.
>> +   *  The score contribution of a match depends on the distance:
>> +   *  - highest score for distance=0 (exact match).
>> +   *  - score gets lower as distance gets higher.
>> +   * Example: for query "a b"~2, a document "x a b a y" can be
>> scored twice:
>> +   * once for "a b" (distance=0), and once for "b a" (distance=2).
>> +   * Possibly not all valid combinations are encountered, because
>> for efficiency
>> +   * we always propagate the least PhrasePosition. This allows to base on
>> +   * PriorityQueue and move forward faster.
>> +   * As result, for example, document "a b c b a"
>> +   * would score differently for queries "a b c"~4 and "c b a"~4,
>> although
>> +   * they really are equivalent.
>> +   * Similarly, for doc "a b c b a f g", query "c b"~2
>> +   * would get same score as "g f"~2, although "c b"~2 could be matched
>> twice.
>> +   * We may want to fix this in the future (currently not, for
>> performance reasons).
>> +   */
>>
>> ""
>>
>>
>>
>> On Mon, Dec 24, 2012 at 1:21 PM, Jack Krupansky 
>> **wrote:
>>
>>  Could you post the full query URL, so we can see exactly what your query
>>> was? Or, post the output of &debug=query, which will show us what Lucene
>>> query was generated.
>>>
>>> -- Jack Krupansky
>>>
>>> -Original Message- From: varun srivastava
>>> Sent: Monday, December 24, 2012 1:53 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: SloppyPhraseScorer behavior change
>>>
>>>
>>> Hi,
>>>  Due to following bug fix
>>> https://issues.apache.org/jira/browse/LUCENE-3215
>>> https://issues.apache.org/jira/browse/LUCENE-3215>>observing
>>> a change
>>>
>>> in behavior of SloppyPhraseScorer. I just wanted to
>>> confirm my understanding with you all.
>>>
>>> After solr 3.5 ( bug is fixed in 3.5), if there is a document "a b c d
>>> e",
>>> then in solr 3.4 only query "a b" will match with document, but in solr
>>> 3.5
>>> onwards, both  query "a b" and "b a" will match. Is it right ?
>>>
>>>
>>> Thanks
>>> Varun
>>>
>>>
>>
>


Re: SolrJ |ContentStreamUpdateRequest | Accessing parsed items without committing to solr

2013-01-11 Thread Alexandre Rafalovitch
If I understand it, you are sending the file to Solr which then uses Tika
library to do the preprocessing/extraction and stores the results in the
defined fields .

If you don't want Solr to do the storing and want to change extracted
fields, just use the Tika library in your client and work with returned
document yourself. This is less of a network load as well, as you don't
send the whole file over the wire.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Fri, Jan 11, 2013 at 3:55 PM, uwe72  wrote:

> i have a bit strange usecase.
>
> when i index a pdf to solr i use ContentStreamUpdateRequest.
> The lucene document then contains in the "text" field all containing items
> (the parsed items of the physical pdf).
>
> i also need to add these parsed items to another lucene document.
>
> is there a way, to receive/parse these items just in memory, without
> comitting them to lucene?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrJ-ContentStreamUpdateRequest-Accessing-parsed-items-without-committing-to-solr-tp4032636.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: SolrJ |ContentStreamUpdateRequest | Accessing parsed items without committing to solr

2013-01-11 Thread uwe72
Yes, i don't really want to index/store the pdf document in lucene.

i just need the parsed tokens for other things.

So you mean i can use ExtractingRequestHandler.java to retrieve the items.

has anybody a piece of code, doing that?

actually i give the pdf as input and want the parsed items (the same what
would be in the "text" field in the stored lucene doc).





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-ContentStreamUpdateRequest-Accessing-parsed-items-without-committing-to-solr-tp4032636p4032646.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrJ |ContentStreamUpdateRequest | Accessing parsed items without committing to solr

2013-01-11 Thread uwe72
ok, seems this works:

  Tika tika = new Tika();
  String tokens = tika.parseToString(file);  




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-ContentStreamUpdateRequest-Accessing-parsed-items-without-committing-to-solr-tp4032636p4032649.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Accessing raw index data

2013-01-11 Thread Shawn Heisey

On 1/11/2013 1:33 PM, Achim Domma wrote:

"At the base, Solr indexes are Lucene indexes, so one can always
  drop down to that level."

That's what I'm looking for. I understand, that at the end, there has to be an inverse index (or rather 
multiple of them), holding all "words" which occurre in my documents, each "word" having 
a list of documents the "word" was part of. I would like to do some statistics based on this 
information, would like to analyze how it changes if I change my text processing settings, ...

If you would give me a starting point like "Data is stored in Lucene indexes, which 
are documented at XXX. In a request handler you can access the indexes via YYY.", I 
would be perfectly happy figuring out the rest on my own. Documentation about 4.0 is a 
bit limited, so it's hard to find an entry point.


There is the TermsComponent, which can be utilized in a terms 
requestHandler.  The example solrconfig.xml found in all downloaded 
copies of Solr has a /terms request handler.


http://wiki.apache.org/solr/TermsComponent

As you've already been told, there is a tool called Luke, but a version 
that works with Solr 4.0.0 is hard to find.  The official download 
location only has a 4.0.0-ALPHA version, and there have been reported 
problems using it with indexes from the final Solr 4.0.0.


Thanks,
Shawn



Re: SolrJ |ContentStreamUpdateRequest | Accessing parsed items without committing to solr

2013-01-11 Thread Erik Hatcher
Look at the extractOnly parameter. 

But doing this in your client is the more recommended way of doing this to keep 
Solr from getting beat up too bad. 

Erik

On Jan 11, 2013, at 15:55, uwe72  wrote:

> i have a bit strange usecase.
> 
> when i index a pdf to solr i use ContentStreamUpdateRequest.
> The lucene document then contains in the "text" field all containing items
> (the parsed items of the physical pdf).
> 
> i also need to add these parsed items to another lucene document.
> 
> is there a way, to receive/parse these items just in memory, without
> comitting them to lucene?
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SolrJ-ContentStreamUpdateRequest-Accessing-parsed-items-without-committing-to-solr-tp4032636.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrJ |ContentStreamUpdateRequest | Accessing parsed items without committing to solr

2013-01-11 Thread uwe72
Erik, what do u mean with this parameter, i don't find it..



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-ContentStreamUpdateRequest-Accessing-parsed-items-without-committing-to-solr-tp4032636p4032656.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrJ |ContentStreamUpdateRequest | Accessing parsed items without committing to solr

2013-01-11 Thread Erik Hatcher
It's an ExtractingRequestHandler parameter (see the wiki).  Not quite sure the 
Java incantation to set that but definitely possible. 
 
Erik

On Jan 11, 2013, at 17:14, uwe72  wrote:

> Erik, what do u mean with this parameter, i don't find it..
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SolrJ-ContentStreamUpdateRequest-Accessing-parsed-items-without-committing-to-solr-tp4032636p4032656.html
> Sent from the Solr - User mailing list archive at Nabble.com.


RE: how to perform a delta-import when related table is updated

2013-01-11 Thread PeterKerk
Awesome!

This one line did the trick:
http://lucene.472066.n3.nabble.com/how-to-perform-a-delta-import-when-related-table-is-updated-tp4032587p4032671.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: retrieving latest document **only**

2013-01-11 Thread J Mohamed Zahoor
Cool… it worked… But the count of all the groups and the count inside stats 
component does not match…
Is that a bug?

./zahoor


On 11-Jan-2013, at 6:48 PM, Upayavira  wrote:

> could you use field collapsing? Boost by date and only show one value
> per group, and you'll have the most recent document only.
> 
> Upayavira
> 
> On Fri, Jan 11, 2013, at 01:10 PM, jmozah wrote:
>> one crude way is first query and pick the latest date from the result
>> then issue a query with q=timestamp[latestDate TO latestDate]
>> 
>> But i dont want to execute two queries...
>> 
>> ./zahoor
>> 
>> On 11-Jan-2013, at 6:37 PM, jmozah  wrote:
>> 
>>> 
>>> 
>>> 
 What do you want?
 'the most recent ones' or '**only** the latest' ?
 
 Perhaps a range query "q=timestamp:[refdate TO NOW]" will match your needs.
 
 Uwe
 
>>> 
>>> 
>>> I need **only** the latest documents...
>>> in the above query , "refdate" can vary based on the query.
>>> 
>>> ./zahoor
>>> 
>>> 
>>> 
>> 



Re: retrieving latest document **only**

2013-01-11 Thread Upayavira
Not sure exactly what you mean, can you give an example?

Upayavira

On Sat, Jan 12, 2013, at 06:32 AM, J Mohamed Zahoor wrote:
> Cool… it worked… But the count of all the groups and the count inside
> stats component does not match…
> Is that a bug?
> 
> ./zahoor
> 
> 
> On 11-Jan-2013, at 6:48 PM, Upayavira  wrote:
> 
> > could you use field collapsing? Boost by date and only show one value
> > per group, and you'll have the most recent document only.
> > 
> > Upayavira
> > 
> > On Fri, Jan 11, 2013, at 01:10 PM, jmozah wrote:
> >> one crude way is first query and pick the latest date from the result
> >> then issue a query with q=timestamp[latestDate TO latestDate]
> >> 
> >> But i dont want to execute two queries...
> >> 
> >> ./zahoor
> >> 
> >> On 11-Jan-2013, at 6:37 PM, jmozah  wrote:
> >> 
> >>> 
> >>> 
> >>> 
>  What do you want?
>  'the most recent ones' or '**only** the latest' ?
>  
>  Perhaps a range query "q=timestamp:[refdate TO NOW]" will match your 
>  needs.
>  
>  Uwe
>  
> >>> 
> >>> 
> >>> I need **only** the latest documents...
> >>> in the above query , "refdate" can vary based on the query.
> >>> 
> >>> ./zahoor
> >>> 
> >>> 
> >>> 
> >> 
>