Adding field to solr dynamically

2013-10-13 Thread Mysurf Mail
My database model is designed using dynamic attributes (Entity Attribute
Value model). For the db I have a service that adds a new attribute. But
everytime a new attributes is added I need to add it to the schema.xml

Is there a possibile way to add a field to solr schama.xml dynamically?


Re: Adding field to solr dynamically

2013-10-15 Thread Mysurf Mail
Thanks.


On Sun, Oct 13, 2013 at 4:18 PM, Jack Krupansky wrote:

> Either simply use a dynamic field, or use the Schema API to add a static
> field:
> https://cwiki.apache.org/**confluence/display/solr/**Schema+API<https://cwiki.apache.org/confluence/display/solr/Schema+API>
>
> Dynamic fields (your nominal field name plus a suffix that specifies the
> type and muliplicity - as detailed in the Solr example schema) may be good
> enough, depending on the rest of your requirements.
>
> -- Jack Krupansky
>
> -Original Message- From: Mysurf Mail
> Sent: Sunday, October 13, 2013 5:32 AM
> To: solr-user@lucene.apache.org
> Subject: Adding field to solr dynamically
>
>
> My database model is designed using dynamic attributes (Entity Attribute
> Value model). For the db I have a service that adds a new attribute. But
> everytime a new attributes is added I need to add it to the schema.xml
>
> Is there a possibile way to add a field to solr schama.xml dynamically?
>


Is there a way to speed up my import

2013-06-27 Thread Mysurf Mail
I have a relational database model
This is the basics of my data-config.xml


 


 


Now, this takes a lot of time.
1 rows in the first query and then each other inner entities are
fetched later (around 10 rows each).

If I use a db profiler I see a the three inner entities query running over
and over (3 select sentences than again 3 select sentences over and over)
This is really not efficient.
And the import can run over 40 hrs ()
Now,
What are my options to run it faster .
1. Obviously there is an option to flat the tables to one big table - but
that will create a lot of other side effects.
I would really like to avoid that extra effort and run solr on my
production relational tables.
So far it works great out of the box and I am searching here if there
is a configuration tweak.
2. If I will flat the rows that - does the schema.xml need to be change
too? or the same fields that are multivalued will keep being multivalued.

Thanks.


Re: Is there a way to speed up my import

2013-06-27 Thread Mysurf Mail
I just configured with the caching and it works mighty fast now.
Instead of unbelievable amount queries it queris only 4 times.
CPU usage has moved from the db to the solr computer but only for a very
short time.

Problem :
I dont see the multi value fields (Inner Entities) anymore
This is  my configuration







 


Now, when I query
http://localhost:8983/solr/vaultCache/select?q=*&indent=true
it returns only the main entity attriburtes.
Where are my inner entities attributes now?
Thanks a lot.







On Thu, Jun 27, 2013 at 10:15 AM, Gora Mohanty  wrote:

> On 27 June 2013 12:32, Mysurf Mail  wrote:
> >
> > I have a relational database model
> > This is the basics of my data-config.xml
> >
> > 
> >  
> > 
> >  > query="select SKU
> >  FROM [TableB]
> > INNER JOIN ...
> > ON ...
> >  INNER JOIN ...
> > ON ...
> > WHERE ... AND ...'">
> >  
> > 
> >
> > Now, this takes a lot of time.
> > 1 rows in the first query and then each other inner entities are
> > fetched later (around 10 rows each).
> >
> > If I use a db profiler I see a the three inner entities query running
> over
> > and over (3 select sentences than again 3 select sentences over and over)
> > This is really not efficient.
> > And the import can run over 40 hrs ()
> > Now,
> > What are my options to run it faster .
> > 1. Obviously there is an option to flat the tables to one big table - but
> > that will create a lot of other side effects.
> > I would really like to avoid that extra effort and run solr on my
> > production relational tables.
> > So far it works great out of the box and I am searching here if there
> > is a configuration tweak.
> > 2. If I will flat the rows that - does the schema.xml need to be change
> > too? or the same fields that are multivalued will keep being multivalued.
>
> You have not shared your actual queries, so it is difficult
> to tell, but my guess would be that it is the JOINs that
> are the bottle-neck rather than the SELECTs. You should
> start by:
> 1. Profile queries from the database back-end to see
> which are taking the most time, and try to simplify
> them.
> 2. Make sure that relevant database columns are indexed.
> This can make a huge difference, though going overboard
>  in indexing all columns might be counter-productive.
> 3. Use Solr DIH's CachedSqlEntityProcessor:
> http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor
> 4. Measure the time that Solr indexing takes: From your
> description, you seem to be guessing at it.
>
> In general, you should not flatten the records in the
> database as that is supposed to be relational data.
>
> Regards,
> Gora
>


Solr - Delta Query Via Full Import

2013-07-02 Thread Mysurf Mail
I am using DIH to fetch rows from db to solr.
I have many 1:n relations and I can do it only if I use caching (super
fast) Therefor I am adding the following attributes to my inner entity

processor="CachedSqlEntityProcessor" cacheKey="" cacheLookup=""

Everything works great and fast. (First the n tables are queried than the
main entity.)

Now I want configured the delta import. And it is not actually working.

I know that by 
standard
I
need to define the following attributes:

   1. query - Initial Query
   2. DeltaQuery - The rows that were changed
   3. DeltaImportQuery - Fetch the data that was changed
   4. parentDeltaQuery - The Keys of the parent entity that has changed
   rows in the current entity

(2-4 only used in delta queries)

And I have seen in a hack in the
documents
that
you can do delta query via full import.
So instead of adding the following attribute -
Query,deltaImportQuery,deltaQuery -I can just add query and call full
instead of delta.

Problem - Only the first query (main entity) is executed when I run the
full import without clean.

Here is a part of my configuration in data-config.xml (I have left
deltaImportQuery though I call only full import)







parent Import Query doent run

2013-07-02 Thread Mysurf Mail
I have 1:n relation between my main entity(PackageVersion) and its tag in
my DB.

I add a new tag with this date to the db at the timestamp and I run delta
import command.
the select retrieves the line but i dont see any other sql.
Here are my data-config.xml configurations:








Solr - working with delta import and cache

2013-07-02 Thread Mysurf Mail
I have two entities in 1:n relation - PackageVersion and Tag.
I have configured DIH to use CachedSqlEntityProcessor and everything works
as planned.
First, Tag entity is selected using the query attribute. Then the main
entity.
Ultra Fast.

Now I am adding the delta import.
Everything runs and loads, but too slow.
Looking at the db profiler output i see :

   1. the delta query of the inner entities run first - which is good.
   2. the delta query of the main entities runs later - which is still good.
   3. deltaImportQuery of the main entity with each of the ID's runs as a
   single select can be improved using "where in" all the result. Is it
   possible?
   4.

   All of the Query attribute of the other tables are running now. This is
   bad. (In real life I have more than one table in 1:n connection). for
   instance I get a lot of

   select ResourceId,[Text] PackageTag
   from [dbo].[Tag] Tag
   Where  ResourceType = 0


run. Because it is from the Query attribute - there is no where clause for
using the ids.
a. How can I fix it ?
b. Can I translate the importquery to use "where in"
c. There is no real order for all the select when requesting deltaImport.
is it possible to implement the caching also when updating delta?

Here is my configuration




   



Re: Solr - working with delta import and cache

2013-07-02 Thread Mysurf Mail
BTW: Just found out that a delta import is only supported by the
SqlEntityProcessor .
Does it matter that I defined processor="CachedSqlEntityProcessor"?


On Tue, Jul 2, 2013 at 5:58 PM, Mysurf Mail  wrote:

> I have two entities in 1:n relation - PackageVersion and Tag.
> I have configured DIH to use CachedSqlEntityProcessor and everything works
> as planned.
> First, Tag entity is selected using the query attribute. Then the main
> entity.
> Ultra Fast.
>
> Now I am adding the delta import.
> Everything runs and loads, but too slow.
> Looking at the db profiler output i see :
>
>1. the delta query of the inner entities run first - which is good.
>2. the delta query of the main entities runs later - which is still
>good.
>3. deltaImportQuery of the main entity with each of the ID's runs as a
>single select can be improved using "where in" all the result. Is it
>possible?
>4.
>
>All of the Query attribute of the other tables are running now. This
>is bad. (In real life I have more than one table in 1:n connection). for
>instance I get a lot of
>
>select ResourceId,[Text] PackageTag
>from [dbo].[Tag] Tag
>Where  ResourceType = 0
>
>
> run. Because it is from the Query attribute - there is no where clause for
> using the ids.
> a. How can I fix it ?
> b. Can I translate the importquery to use "where in"
> c. There is no real order for all the select when requesting deltaImport.
> is it possible to implement the caching also when updating delta?
>
> Here is my configuration
>
>  query=  "select 
> from [dbo].[Package] Package inner join 
> [dbo].[PackageVersion] PackageVersion on Package.Id = 
> PackageVersion.PackageId"
> deltaQuery = "select PackageVersion.Id PackageVersionId
>   from [dbo].[Package] Package inner join 
> [dbo].[PackageVersion] PackageVersion on Package.Id = PackageVersion.PackageId
>   where Package.LastModificationTime > 
> '${dataimporter.last_index_time}' OR PackageVersion.Timestamp > 
> '${dih.last_index_time}'"
> deltaImportQuery=" select 
> from [dbo].[Package] Package inner join 
> [dbo].[PackageVersion] PackageVersion on Package.Id = PackageVersion.PackageId
> Where PackageVersion.Id='${dih.delta.PackageVersionId}'" >
>
>  processor="CachedSqlEntityProcessor" cacheKey="ResourceId" 
> cacheLookup="PackageVersion.PackageId"
> query="select ResourceId,[Text] PackageTag
>from [dbo].[Tag] Tag
>Where ResourceType = 0"
> deltaQuery="select ResourceId,[Text] PackageTag
> from [dbo].[Tag] Tag
> Where ResourceType = 0 and Tag.TimeStamp 
> > '${dih.last_index_time}'"
> parentDeltaQuery="select 
> PackageVersion.PackageVersionId
>   from [dbo].[Package]
>   where 
> Package.Id=${PackageTag.ResourceId}">
>
> 
>
>


two types of answers in my query

2013-07-08 Thread Mysurf Mail
Hi,
A general question:


Let's say I have Car And CarParts 1:n relation.

And I have discovered that the user had entered in the search field instead
of car name - a part serial number (SKU).
(I discovered it useing regex)

Is there a way to fetch different types of answers in Solr?
Is there a way to fetch mixed types in the answers?
Is there something similiar to that and how is that feature called?

Thank you.


Re: two types of answers in my query

2013-07-10 Thread Mysurf Mail
This will work.
Thanks.


On Tue, Jul 9, 2013 at 4:37 PM, Jack Krupansky wrote:

> Usually a car term and a car part term will look radically different. So,
> simply use the edismax query parser and set "qf" to be both the car and car
> part fields. If either matches, the document will be selected. And if you
> have a "type" field, you can check that to see if a car or part was matched
> in the results.
>
> -- Jack Krupansky
>
> -Original Message- From: Mysurf Mail
> Sent: Tuesday, July 09, 2013 2:38 AM
> To: solr-user@lucene.apache.org
> Subject: two types of answers in my query
>
>
> Hi,
> A general question:
>
>
> Let's say I have Car And CarParts 1:n relation.
>
> And I have discovered that the user had entered in the search field instead
> of car name - a part serial number (SKU).
> (I discovered it useing regex)
>
> Is there a way to fetch different types of answers in Solr?
> Is there a way to fetch mixed types in the answers?
> Is there something similiar to that and how is that feature called?
>
> Thank you.
>


Disabling workd breaking for codes and SKUs

2013-07-10 Thread Mysurf Mail
Some of the data in my index is SKUs and barcodes as follows
ASDF3-DASDD-2133DD-21H44

I want to disable the wordbreaking for this type (maybe through Regex.
Is there a possible way to do this?


Running Solr in a cluster - high availability only

2013-07-15 Thread Mysurf Mail
Hi,
I would like to run two Solr instances on different computers as a cluster.
My main interest is High availability - meaning, in case one server crashes
or is down there will be always another one.

(my performances on a single instance are great. I do not need to split the
data to two servers.)

Questions:
1. What is the best practice?
Is it different than clustering for index splitting? Do I need Shards?
2. Do I need zoo keeper?
3. Is it a container based configuration (different for jetty and tomcat)
4, Do I need an external NLB for that ?
5. When one computer is up after crashing. how dows it updates its index?


adding date column to the index

2013-07-22 Thread Mysurf Mail
I have added a date field to my index.
I dont want the query to search on this field, but I want it to be returned
with each row.
So I have defined it in the scema.xml as follows:
  



I added it to the select in data-config.xml and I see it selected in the
profiler.
now, when I query all fileds (using the dashboard) I dont see it.
Even when I ask for it specifically I dont see it.
What am I doing wrong?

(In the db it is (datetimeoffset(7)))


deserializing highlighting json result

2013-07-22 Thread Mysurf Mail
When I request a json result I get the following streucture in the
highlighting

{"highlighting":{
   "394c65f1-dfb1-4b76-9b6c-2f14c9682cc9":{
  "PackageName":["- Testing channel twenty."]},
   "baf8434a-99a4-4046-8a4d-2f7ec09eafc8":{
  "PackageName":["- Testing channel twenty."]},
   "0a699062-cd09-4b2e-a817-330193a352c1":{
 "PackageName":["- Testing channel twenty."]},
   "0b9ec891-5ef8-4085-9de2-38bfa9ea327e":{
 "PackageName":["- Testing channel twenty."]}}}


It is difficult to deserialize this json because the guid is in the
attribute name.
Is that solveable (using c#)?


Re: adding date column to the index

2013-07-22 Thread Mysurf Mail
clarify: I did deleted the data in the index and reloaded it (+ commit).
(As i said, I have seen it loaded in the sb profiler)
Thanks for your comment.


On Mon, Jul 22, 2013 at 9:25 PM, Lance Norskog  wrote:

> Solr/Lucene does not automatically add when asked, the way DBMS systems
> do. Instead, all data for a field is added at the same time. To get the new
> field, you have to reload all of your data.
>
> This is also true for deleting fields. If you remove a field, that data
> does not go away until you re-index.
>
>
> On 07/22/2013 07:31 AM, Mysurf Mail wrote:
>
>> I have added a date field to my index.
>> I dont want the query to search on this field, but I want it to be
>> returned
>> with each row.
>> So I have defined it in the scema.xml as follows:
>>> stored="true" required="true"/>
>>
>>
>>
>> I added it to the select in data-config.xml and I see it selected in the
>> profiler.
>> now, when I query all fileds (using the dashboard) I dont see it.
>> Even when I ask for it specifically I dont see it.
>> What am I doing wrong?
>>
>> (In the db it is (datetimeoffset(7)))
>>
>>
>


Re: deserializing highlighting json result

2013-07-22 Thread Mysurf Mail
the guid appears as the attribute id and not as

"id":"baf8434a-99a4-4046-8a4d-2f7ec09eafc8":

Trying to create an object that holds this guid will create an attribute
with name baf8434a-99a4-4046-8a4d-2f7ec09eafc8

On Mon, Jul 22, 2013 at 6:30 PM, Jack Krupansky wrote:

> Exactly why is it difficult to deserialize? Seems simple enough.
>
> -- Jack Krupansky
>
> -Original Message- From: Mysurf Mail Sent: Monday, July 22, 2013
> 11:14 AM To: solr-user@lucene.apache.org Subject: deserializing
> highlighting json result
> When I request a json result I get the following streucture in the
> highlighting
>
> {"highlighting":{
>   "394c65f1-dfb1-4b76-9b6c-**2f14c9682cc9":{
>  "PackageName":["- Testing channel twenty."]},
>   "baf8434a-99a4-4046-8a4d-**2f7ec09eafc8":{
>  "PackageName":["- Testing channel twenty."]},
>   "0a699062-cd09-4b2e-a817-**330193a352c1":{
> "PackageName":["- Testing channel twenty."]},
>   "0b9ec891-5ef8-4085-9de2-**38bfa9ea327e":{
> "PackageName":["- Testing channel twenty."]}}}
>
>
> It is difficult to deserialize this json because the guid is in the
> attribute name.
> Is that solveable (using c#)?
>


Re: adding date column to the index

2013-07-23 Thread Mysurf Mail
Ahaa
I deleted the data folder and now I get
Invalid Date String:'2010-01-01 00:00:00 +02:00'
I need to cast it to solr. as I read it in the schema using




On Tue, Jul 23, 2013 at 10:50 AM, Gora Mohanty  wrote:

> On 23 July 2013 11:13, Mysurf Mail  wrote:
> > clarify: I did deleted the data in the index and reloaded it (+ commit).
> > (As i said, I have seen it loaded in the sb profiler)
> [...]
>
> Please share your DIH configuration file, and Solr's
> schema.xml. It must be that somehow the column
> is not getting indexed.
>
> Regards,
> Gora
>


Re: adding date column to the index

2013-07-23 Thread Mysurf Mail
How do I cast datetimeoffset(7)) to solr date


On Tue, Jul 23, 2013 at 11:11 AM, Mysurf Mail  wrote:

> Ahaa
> I deleted the data folder and now I get
> Invalid Date String:'2010-01-01 00:00:00 +02:00'
> I need to cast it to solr. as I read it in the schema using
>
>  stored="true" required="true"/>
>
>
> On Tue, Jul 23, 2013 at 10:50 AM, Gora Mohanty  wrote:
>
>> On 23 July 2013 11:13, Mysurf Mail  wrote:
>> > clarify: I did deleted the data in the index and reloaded it (+ commit).
>> > (As i said, I have seen it loaded in the sb profiler)
>> [...]
>>
>> Please share your DIH configuration file, and Solr's
>> schema.xml. It must be that somehow the column
>> is not getting indexed.
>>
>> Regards,
>> Gora
>>
>
>


solr - Deleting a row from the index, using the configuration files only.

2013-07-23 Thread Mysurf Mail
I am updating my solr index using deltaQuery and deltaImportQuery
attributes in data-config.xml.
In my condition I write

where MyDoc.LastModificationTime > '${dataimporter.last_index_time}'
then after I add a row I trigger an update using data-config.xml.

Now, sometimes I delete a row.
How can I implement this with configuration files only
(without sending a delete rest command to solr ).

Lets say my object is not deleted but its status is changed to deleted.
I dont index that status field, as I want to hold only the live rows.
(otherwise I could have just filtered it)
Is there a way to do it?
thanks.


filter query result by user

2013-07-23 Thread Mysurf Mail
I want to restrict the returned results to be only the documents that were
created by the user.
I then load to the index the createdBy attribute and set it to index
false,stored="true"



then in the I want to filter by "CreatedBy" so I use the dashboard, check
edismax and add
I check edismax and add CreatedBy:user1 to the qf field.


the result query is

http://
:8983/solr/vault/select?q=*%3A*&defType=edismax&qf=CreatedBy%3Auser1

Nothing is filtered. all rows returned.
What was I doing wrong?


Re: filter query result by user

2013-07-23 Thread Mysurf Mail
But I dont want it to be searched.on

lets say the user name is "giraffe"
I do want to filter to be "where created by = giraffe"

but when the user searches his name, I will want only documents with name
"Giraffe".
since it is indexed, wouldn't it return all rows created by him?
Thanks.



On Tue, Jul 23, 2013 at 4:28 PM, Raymond Wiker  wrote:

> Simple: the field needs to be "indexed" in order to search (or filter) on
> it.
>
>
> On Tue, Jul 23, 2013 at 3:26 PM, Mysurf Mail 
> wrote:
>
> > I want to restrict the returned results to be only the documents that
> were
> > created by the user.
> > I then load to the index the createdBy attribute and set it to index
> > false,stored="true"
> >
> >  > required="true"/>
> >
> > then in the I want to filter by "CreatedBy" so I use the dashboard, check
> > edismax and add
> > I check edismax and add CreatedBy:user1 to the qf field.
> >
> >
> > the result query is
> >
> > http://
> > :8983/solr/vault/select?q=*%3A*&defType=edismax&qf=CreatedBy%3Auser1
> >
> > Nothing is filtered. all rows returned.
> > What was I doing wrong?
> >
>


Re: filter query result by user

2013-07-23 Thread Mysurf Mail
I am probably using it wrong.
http://
...:8983/solr/vault10k/select?q=*%3A*&defType=edismax&qf=CreatedBy%BLABLA
returns all rows.
It neglects my qf filter.

Should I even use qf for filtrering with edismax?
(It doesnt say that in the doc
http://wiki.apache.org/solr/ExtendedDisMax#qf_.28Query_Fields.29)



On Tue, Jul 23, 2013 at 4:32 PM, Mysurf Mail  wrote:

> But I dont want it to be searched.on
>
> lets say the user name is "giraffe"
> I do want to filter to be "where created by = giraffe"
>
> but when the user searches his name, I will want only documents with name
> "Giraffe".
> since it is indexed, wouldn't it return all rows created by him?
> Thanks.
>
>
>
> On Tue, Jul 23, 2013 at 4:28 PM, Raymond Wiker  wrote:
>
>> Simple: the field needs to be "indexed" in order to search (or filter) on
>> it.
>>
>>
>> On Tue, Jul 23, 2013 at 3:26 PM, Mysurf Mail 
>> wrote:
>>
>> > I want to restrict the returned results to be only the documents that
>> were
>> > created by the user.
>> > I then load to the index the createdBy attribute and set it to index
>> > false,stored="true"
>> >
>> > > > required="true"/>
>> >
>> > then in the I want to filter by "CreatedBy" so I use the dashboard,
>> check
>> > edismax and add
>> > I check edismax and add CreatedBy:user1 to the qf field.
>> >
>> >
>> > the result query is
>> >
>> > http://
>> > :8983/solr/vault/select?q=*%3A*&defType=edismax&qf=CreatedBy%3Auser1
>> >
>> > Nothing is filtered. all rows returned.
>> > What was I doing wrong?
>> >
>>
>
>


solr - set fileds as default search field

2013-07-29 Thread Mysurf Mail
The following query works well for me

http://[]:8983/solr/vault/select?q=VersionComments%3AWhite

returns all the documents where version comments includes White

I try to omit the field name and put it as a default value as follows : In
solr config I write



 
   explicit
   10
   PackageName
   Tag
   VersionComments
   VersionTag
   Description
   SKU
   SKUDesc
 

I restart the solr and create a full import.
Then I try using

 http://[]:8983/solr/vault/select?q=White

(Where

 http://[]:8983/solr/vault/select?q=VersionComments%3AWhite

still works)

But I dont get the document any as answer.
What am I doing wrong?


Working with solr over two different db schemas

2013-07-31 Thread Mysurf Mail
Been working on it for quitre some time.

this is my config







  
 

  


Now, this runs in my test env and the only thing I do is change the
configuration to another db( and as a result also the schema name from
[dbo] to another )
This result in a totally different behavior.
In the first configuration the selects were done be this order - inner
object and then outer object. which means that the cache works.
In the second configuration - over the other db the order was first the
outer and then the inner. cache did not work at all.
the inner query is not stored at all.

What could be the problem?


solr - please help me arrange my search url

2013-08-01 Thread Mysurf Mail
I am still doing something wrong with solr.

I am querying with the following parameters

http://...:8983/solr/vault/select?q=jump&qf=PackageTag&defType=edismax

(meaning I am using edismax and I query on the field PackageTag )

I get nothing.

when I dont declare the field and query

http://...:8983/solr/vault/select?q=jump&defType=edismax

and declare the searched on fileds in


   explicit
   10
   PackageName
   PackageTag
   

I get also nothing

Its only when I query with

http://...:8983/solr/vault/select?q=PackageTag:walk&defType=edismax

My goal is to have two kinds of url -

   1. one that will query without getting the "SearchedOn" fields.
   I will put default declaration in another place (where then?)
   2. one that will query with getting the "SearchedOn" fields.
   should I use dismax?edismax? qf or q=..:...

Thanks.


Re: solr - please help me arrange my search url

2013-08-04 Thread Mysurf Mail
So,
If I query over more than one field always. And they are always the same
fields then I cannot place them in a config file.
I should always list all them in my url?



On Thu, Aug 1, 2013 at 5:05 PM, Jack Krupansky wrote:

> 1. "df" only supports a single field. All but the first will be ignored.
> 2. "qf" takes a list as a space-delimited string, with optional boost (^n)
> after each field name.
> 3. "df" is only used by edismax if "qf" is not present.
> 3. Your working query uses a different term ("walk") than your other
> queries ("jump").
>
> Are you sure that "jump" appears in that field? What does your field
> analyzer look like? Or is it a string field? If the latter, does the case
> match exactly and are there any extraneous spaces?
>
> -- Jack Krupansky
>
> -Original Message- From: Mysurf Mail
> Sent: Thursday, August 01, 2013 7:48 AM
> To: solr-user@lucene.apache.org
> Subject: solr - please help me arrange my search url
>
>
> I am still doing something wrong with solr.
>
> I am querying with the following parameters
>
> http://...:8983/solr/vault/**select?q=jump&qf=PackageTag&**defType=edismax
>
> (meaning I am using edismax and I query on the field PackageTag )
>
> I get nothing.
>
> when I dont declare the field and query
>
> http://...:8983/solr/vault/**select?q=jump&defType=edismax
>
> and declare the searched on fileds in
>
> 
>   explicit
>   10
>   PackageName
>   PackageTag
>   
>
> I get also nothing
>
> Its only when I query with
>
> http://...:8983/solr/vault/**select?q=PackageTag:walk&**defType=edismax
>
> My goal is to have two kinds of url -
>
>   1. one that will query without getting the "SearchedOn" fields.
>
>   I will put default declaration in another place (where then?)
>   2. one that will query with getting the "SearchedOn" fields.
>
>   should I use dismax?edismax? qf or q=..:...
>
> Thanks.
>


solr - using fq parameter does not retrieve an answer

2013-08-05 Thread Mysurf Mail
When I query using

http://localhost:8983/solr/vault/select?q=*:*

I get reuslts including the following


  ...
  ...
  7
  ...


Now I try to get only that row so I add to my query fq=VersionNumber:7

http://localhost:8983/solr/vault/select?q=*:*&fq=VersionNumber:7

And I get nothing.
Any idea?


Re: solr - using fq parameter does not retrieve an answer

2013-08-06 Thread Mysurf Mail
Thanks.


On Mon, Aug 5, 2013 at 4:57 PM, Shawn Heisey  wrote:

> On 8/5/2013 2:35 AM, Mysurf Mail wrote:
> > When I query using
> >
> > http://localhost:8983/solr/vault/select?q=*:*
> >
> > I get reuslts including the following
> >
> > 
> >   ...
> >   ...
> >   7
> >   ...
> > 
> >
> > Now I try to get only that row so I add to my query fq=VersionNumber:7
> >
> > http://localhost:8983/solr/vault/select?q=*:*&fq=VersionNumber:7
> >
> > And I get nothing.
> > Any idea?
>
> Is the VersionNumber field indexed?  If it's not, you won't be able to
> search on it.
>
> If you change your schema so that the field has 'indexed="true", you'll
> have to reindex.
>
> http://wiki.apache.org/solr/HowToReindex
>
> When you are retrieving a single document, it's better to use the q
> parameter rather than the fq parameter.  Querying a single document will
> pollute the cache.  It's a lot better to pollute the queryResultCache
> than the filterCache.  The former is generally much larger than the
> latter and better able to deal with pollution.
>
> Thanks,
> Shawn
>
>


Knowing what field caused the retrival of the document

2013-08-06 Thread Mysurf Mail
I have two indexed fields in my document.- Name, Comment.
The user searches for a phrase and I need to act differently if it appeared
in the comment or the name.
Is there a way to know why the document was retrieved?
Thanks.


How to plan field boosting

2013-08-06 Thread Mysurf Mail
I query using

qf=Name+Tag

Now I want that documents that have the phrase in tag will arrive first so
I use

qf=Name+Tag^2

and they do appear first.


What should be the rule of thumb regarding the number that comes after the
field?
How do I know what number to set it?


Re: Knowing what field caused the retrival of the document

2013-08-06 Thread Mysurf Mail
But what if this for multiple words ?
I am guessing solr knows why the document is there since I get to see the
paragraph in the highlight.(hl) section.


On Tue, Aug 6, 2013 at 11:36 AM, Raymond Wiker  wrote:

> If you were searching for single words (terms), you could use the 'tf'
> function, by adding something like
>
> matchesinname:tf(name, "whatever")
>
> to the 'fl' parameter - if the 'name' field contains "whatever", the
> (result) field 'matchesinname' will be 1.
>
>
>
>
> On Tue, Aug 6, 2013 at 10:24 AM, Mysurf Mail 
> wrote:
>
> > I have two indexed fields in my document.- Name, Comment.
> > The user searches for a phrase and I need to act differently if it
> appeared
> > in the comment or the name.
> > Is there a way to know why the document was retrieved?
> > Thanks.
> >
>


Multiple sorting does not work as expected

2013-08-06 Thread Mysurf Mail
My documents has 2 indexed attribute - name (string) and version (number)
I want within the same score the documents will be displayed by the
following order

score(desc),name(desc),version(desc)

Therefor I query using :

http://localhost:8983/solr/vault/select?
   q=BOM&fl=*:score&
   sort=score+desc,Name+desc,Version+desc

And I get the following inside the result:


   BOM Total test2
   ...
   2
   ...
   2.2388418


   BOM Total test - Copy
   ...
   2
   ...
   2.2388418


  BOM Total test2
  ...
  1
  ...
  2.2388418


The scoring is equal, but the name is not sorted.

What am I doing wrong here?


Re: Multiple sorting does not work as expected

2013-08-06 Thread Mysurf Mail
my schema


 
 
 



On Tue, Aug 6, 2013 at 5:06 PM, Mysurf Mail  wrote:

> My documents has 2 indexed attribute - name (string) and version (number)
> I want within the same score the documents will be displayed by the
> following order
>
> score(desc),name(desc),version(desc)
>
> Therefor I query using :
>
> http://localhost:8983/solr/vault/select?
>q=BOM&fl=*:score&
>sort=score+desc,Name+desc,Version+desc
>
> And I get the following inside the result:
>
> 
>BOM Total test2
>...
>2
>...
>2.2388418
> 
> 
>BOM Total test - Copy
>...
>2
>...
>2.2388418
> 
> 
>   BOM Total test2
>   ...
>   1
>   ...
>   2.2388418
> 
>
> The scoring is equal, but the name is not sorted.
>
> What am I doing wrong here?
>


Re: Multiple sorting does not work as expected

2013-08-06 Thread Mysurf Mail
I don't see how it is sorted.
this is the order as displayed above

1-> BOM Total test2
2-> BOM Total test - Copy
3-> BOM Total test2

all in the same  2.2388418 score


On Tue, Aug 6, 2013 at 5:28 PM, Jack Krupansky wrote:

> The Name field is sorted as you have requested - "desc". I suspect that
> you wanted name to be sorted "asc" (natural order.)
>
> -- Jack Krupansky
>
> -Original Message- From: Mysurf Mail
> Sent: Tuesday, August 06, 2013 10:22 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Multiple sorting does not work as expected
>
>
> my schema
>
> 
>  required="true"/>
>  required="true"/>
> 
>
>
>
> On Tue, Aug 6, 2013 at 5:06 PM, Mysurf Mail  wrote:
>
>  My documents has 2 indexed attribute - name (string) and version (number)
>> I want within the same score the documents will be displayed by the
>> following order
>>
>> score(desc),name(desc),**version(desc)
>>
>> Therefor I query using :
>>
>> http://localhost:8983/solr/**vault/select<http://localhost:8983/solr/vault/select>
>> ?
>>q=BOM&fl=*:score&
>>sort=score+desc,Name+desc,**Version+desc
>>
>> And I get the following inside the result:
>>
>> 
>>BOM Total test2
>>...
>>2
>>...
>>2.2388418
>> 
>> 
>>BOM Total test - Copy
>>...
>>2
>>...
>>2.2388418
>> 
>> 
>>   BOM Total test2
>>   ...
>>   1
>>   ...
>>   2.2388418
>> 
>>
>> The scoring is equal, but the name is not sorted.
>>
>> What am I doing wrong here?
>>
>>
>


Solr - how do I index barcode

2013-08-08 Thread Mysurf Mail
I have a documnet that contains the following data

car {
id: guid
name:   string
sku:   list
}

Now, The barcodes dont have a pattern. It can be either one of the
follwings:

ABCD-EF34GD-JOHN
ABCD-C08-YUVF

I want to index my documents so that search for
1. ABCD will return both.
2. AB will return both.
3. JO - will return ABCD-EF34GD-JOHN but not car with name john.

so far I have defined car and sku as text_en.
But I dont get bulletes no 2 and 3.
IS there a better way to define sku attribute.
Thanks.


Re: Solr - how do I index barcode

2013-08-08 Thread Mysurf Mail
2. notes
1. My current query is similiar to this
http://127.0.0.1:8983/solr/vault/select?q=ABCD&qf=Name+SKU&defType=edismax

2. I want it to be case insensitive




On Thu, Aug 8, 2013 at 2:52 PM, Mysurf Mail  wrote:

> I have a documnet that contains the following data
>
> car {
> id: guid
> name:   string
> sku:   list
> }
>
> Now, The barcodes dont have a pattern. It can be either one of the
> follwings:
>
> ABCD-EF34GD-JOHN
> ABCD-C08-YUVF
>
> I want to index my documents so that search for
> 1. ABCD will return both.
> 2. AB will return both.
> 3. JO - will return ABCD-EF34GD-JOHN but not car with name john.
>
> so far I have defined car and sku as text_en.
> But I dont get bulletes no 2 and 3.
> IS there a better way to define sku attribute.
> Thanks.
>


solr not writing logs when it runs not from its main folder

2013-08-13 Thread Mysurf Mail
When I run solr using

java -jar "C:\solr\example\start.jar"

It writes logs to C:\solr\example\logs.

When I run it using

java -Dsolr.solr.home="C:\solr\example\solr"
 -Djetty.home="C:\solr\example"
 -Djetty.logs="C:\solr\example\logs"
 -jar "C:\solr\example\

start.jar"

it writes logs only if I run it from

C:\solr\example>

any other folder - logs are not written.
This is important as I need to run it as a service later (using nssm) What
should I change?


autocomplete feature - where to begin

2013-08-13 Thread Mysurf Mail
I have indexed the data from the db and so far it searches really well.
Now I want to create auto-complete/suggest feature in my website
So far I have seen articles about Suggester, spellchecker, and
searchComponents.
Can someone point me to a good article about basic autocomplete
implementation?


Re: autocomplete feature - where to begin

2013-08-14 Thread Mysurf Mail
Thanks. Will read it now :-)


On Tue, Aug 13, 2013 at 8:33 PM, Cassandra Targett wrote:

> The autocomplete feature in Solr is built on the spell checker
> component, and is called Suggester, which is why you've seen both of
> those mentioned. It's implemented with a searchComponent and a
> requestHandler.
>
> The Solr Reference Guide has a decent overview of how to implement it
> and I just made a few edits to make what needs to be done a bit more
> clear:
>
> https://cwiki.apache.org/confluence/display/solr/Suggester
>
> If you have suggestions for improvements to that doc (such as steps
> that aren't clear), you're welcome to set up an account there and
> leave a comment.
>
> Cassandra
>
> On Tue, Aug 13, 2013 at 11:16 AM, Mysurf Mail 
> wrote:
> > I have indexed the data from the db and so far it searches really well.
> > Now I want to create auto-complete/suggest feature in my website
> > So far I have seen articles about Suggester, spellchecker, and
> > searchComponents.
> > Can someone point me to a good article about basic autocomplete
> > implementation?
>


Troubles defining suggester/ understanding results

2013-08-15 Thread Mysurf Mail
I am having troubles defining suggester for auto complete after reading the
tutorial.

Here are my shcema definitions:



 ...


I also added two field types


  


  




  




  


Now since I want to make suggestions from multiple fields and I cant
declare two fields I defined :

and copied three of the fields using :

Problems:
1. everything loads pretty well. but copying the fields to a new fields
just inflate my index. is there a possiblity to define the suggester on
mopre then one field? 2. I cant understand the results. querying

 http://127.0.0.1:8983/solr/Book/suggest?q=th

returns docs such as
"that are labelled in black on a black background a little black light"
though quering

 http://127.0.0.1:8983/solr/vault-Book/suggest?q=lab

doesnt return anything.
lab is found in the previous result as well.
What is the problem?


Problem parsing suggest response

2013-09-02 Thread Mysurf Mail
Hi,
I am having problems parsing suggest json response in c#.
Here is an example

{

   - responseHeader:
   {
  - status: 0,
  - QTime: 1
  },
   - spellcheck:
   {
  - suggestions:
  [
 - "at",
 -
 {
- numFound: 1,
- startOffset: 1,
- endOffset: 3,
- suggestion:
[
   - "atrion"
   ]
},
 - "l",
 -
 {
- numFound: 2,
- startOffset: 4,
- endOffset: 5,
- suggestion:
[
   - "lot",
   - "loadtest_template_700"
   ]
},
 - "collation",
 - ""atrion lot""
 ]
  }

}

1. Is this a valid json? Shouldnt every item be surrounded by quatation
marks?
2. The items "at" and "l" are not preceded by name.
(This generates different xml in every online json-to-xml translater.
Is this a standard json?
Can I interfere with the structure?

Thanks.


solr suggestion -

2013-09-02 Thread Mysurf Mail
the following request
http://127.0.0.1:8983/solr/vault/suggest?wt=json&q=at%20l


Returns phrases that starts with at and with l (as shown below )
Now, what if I want phrases that starts with "At l" such as "At Least..."
Thanks.


{

   - responseHeader:
   {
  - status: 0,
  - QTime: 1
  },
   - spellcheck:
   {
  - suggestions:
  [
 - "at",
 -
 {
- numFound: 1,
- startOffset: 1,
- endOffset: 3,
- suggestion:
[
   - "atrion"
   ]
},
 - "l",
 -
 {
- numFound: 2,
- startOffset: 4,
- endOffset: 5,
- suggestion:
[
   - "lot",
   - "loadtest_template_700"
   ]
},
 - "collation",
 - ""atrion lot""
 ]
  }

}


Solr suggest - How to define solr suggest as case insensitive

2013-09-08 Thread Mysurf Mail
My suggest (spellchecker) is returning case sensitive answers. (I use it to
autocomplete - dog and Dog return different phrases)\

my suggest is defined as follows - in solrconfig -

 

suggest
org.apache.solr.spelling.suggest.Suggester
org.apache.solr.spelling.suggest.tst.TSTLookup
suggest  
0.005
true





true
suggest
true
5
true


suggest



in schema



and



and


  




  



Re: solr suggestion -

2013-09-10 Thread Mysurf Mail
Yes.
I understood that from the result.
But how do I change that behaviour?

"Don't do any analysis on the field you are using for suggestion"

Please elaborate.


On Mon, Sep 9, 2013 at 8:48 PM, tamanjit.bin...@yahoo.co.in <
tamanjit.bin...@yahoo.co.in> wrote:

> Don't do any analysis on the field you are using for suggestion. What is
> happening here is that query time and indexing time the tokens are being
> broken on white space. So effectively, "at" is being taken as one token and
> "l" is being taken as another token for which you get two different
> suggestions.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/solr-suggestion-tp4087841p4088919.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr suggest - How to define solr suggest as case insensitive

2013-09-10 Thread Mysurf Mail
I have added it and it didnt work. Still returning different result to 1=C
and q=c


On Tue, Sep 10, 2013 at 1:52 AM, Chris Hostetter
wrote:

>
> : This is probably because your dictionary is made up of all lower case
> tokens,
> : but when you query the spell-checker similar analysis doesnt happen.
> Ideal
> : case would be when you query the spellchecker you send lower case queries
>
> You can init the SpellCheckComponent with a "queryAnalyzerFieldType"
> option that will control what analysis happens.  ie...
>
>   
>   phrase_suggest
>
>
> ...it would be nice if this defaulted to using the fieldType of hte field
> you configure on the Suggester, but not all Impls are based on the index
> (you might be using an external dict file) so it has to be explicitly
> configured, and defaults to using a simple WhitespaceAnalyzer.
>
>
> -Hoss
>


Solr Suggester - How do I filter autocomplete results

2013-09-10 Thread Mysurf Mail
I want to filter the auto complete results from my suggester Lets say I
have a book table

Table (Id Guid, BookName String, BookOwner id)

I want each user to get a list to autocomplete from its own books.

I want to add something like the

http://.../solr/vault/suggest?q=c&fq=BookOwner:3

This doesnt work. What other ways do I have to implement it?


Solr doesnt return answer when searching numbers

2013-09-10 Thread Mysurf Mail
I am querying using

http://...:8983/solr/vault/select?q="design test"&fl=PackageName

I get 3 result:

   - design test
   - design test 2013
   - design test for jobs

Now when I query using q="test for jobs"
-> I get only "design test for jobs"

But when I query using q = 2013

http://...:8983/solr/vault/select?q=2013&fl=PackageName

I get no result. Why doesnt it return an answer when I query with numbers?

In schema xml

 


How to define facet.prefix as case-insensitive

2013-09-22 Thread Mysurf Mail
I am using facet.prefix for auto complete.
This is my definition

 
 
  explicit
  ...
  true
  on
  Suggest


this is my field



and

 
  


  


all works fine but when I search using caps lock it doesn't return answers.
Even when the field contains capitals letters - it doesn't.

I assume that the field in solr is lowered (from the field type filter
definition) but the search term is not.
How can I control the search term caps/no caps?

Thanks.


Re: How to define facet.prefix as case-insensitive

2013-09-23 Thread Mysurf Mail
thanks.


On Sun, Sep 22, 2013 at 6:24 PM, Erick Erickson wrote:

> You'll have to lowercase the term in your app and set
> terms.prefix to that value, there's no analysis done
> on the terms.prefix value.
>
> Best,
> Erick
>
> On Sun, Sep 22, 2013 at 4:07 AM, Mysurf Mail 
> wrote:
> > I am using facet.prefix for auto complete.
> > This is my definition
> >
> >  
> >  
> >   explicit
> >   ...
> >   true
> >   on
> >   Suggest
> > 
> >
> > this is my field
> >
> >  > required="false" multiValued="true"/>
> >
> > and
> >
> >  
> >   
> > 
> > 
> >   
> > 
> >
> > all works fine but when I search using caps lock it doesn't return
> answers.
> > Even when the field contains capitals letters - it doesn't.
> >
> > I assume that the field in solr is lowered (from the field type filter
> > definition) but the search term is not.
> > How can I control the search term caps/no caps?
> >
> > Thanks.
>


solr - searching part of words

2013-09-23 Thread Mysurf Mail
My field is defined as



*text_en is defined as in the original schema.xml that comes with solr

Now, my field has the following vaues

   - "one"
   - "one1"

searching for "one" returns only the field "one". What causes it? how can I
change it?


installing & configuring solr over ms sql server - tutorial needed

2013-05-31 Thread Mysurf Mail
I am trying to config solr over ms sql server.
I found only this tutorial
whih
is a bit old (2011)
Is there an updated  / formal tutorial?


Re: installing & configuring solr over ms sql server - tutorial needed

2013-05-31 Thread Mysurf Mail
Thanks.
A tutorial on getting solr over mssql ?
I didnt find it even with jetty



On Fri, May 31, 2013 at 6:21 PM, Alexandre Rafalovitch
wrote:

> You have two mostly-separate issues here. Running Solr in Tomcat and
> indexing MSSql server.
>
> Try just running a default embedded-Jetty example until you get data
> import sorted out. Then, you can worry about Tomcat. And it would be
> easier to help with one problem at a time.
>
> Regards,
>Alex.
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Fri, May 31, 2013 at 11:03 AM, Mysurf Mail 
> wrote:
> > I am trying to config solr over ms sql server.
> > I found only this tutorial
> > <http://www.chrisumbel.com/article/lucene_solr_sql_server>whih
> > is a bit old (2011)
> > Is there an updated  / formal tutorial?
>


Re: installing & configuring solr over ms sql server - tutorial needed

2013-05-31 Thread Mysurf Mail
for instance step 5 - Download and install a SQL Server JDBC drive.
Where do I put it when using jetty?

* Just asked here a question if an official  tutorial for ms sql server
exists before I try to go through several tutorials.



On Fri, May 31, 2013 at 6:42 PM, Alexandre Rafalovitch
wrote:

> What's wrong with the one you found. Just ignore steps 1-4 and go
> right into driver and DIH setup. If you hit any problems, you now have
> a specific question to ask.
>
> Regards,
>Alex.
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Fri, May 31, 2013 at 11:29 AM, Mysurf Mail 
> wrote:
> > Thanks.
> > A tutorial on getting solr over mssql ?
> > I didnt find it even with jetty
> >
> >
> >
> > On Fri, May 31, 2013 at 6:21 PM, Alexandre Rafalovitch
> > wrote:
> >
> >> You have two mostly-separate issues here. Running Solr in Tomcat and
> >> indexing MSSql server.
> >>
> >> Try just running a default embedded-Jetty example until you get data
> >> import sorted out. Then, you can worry about Tomcat. And it would be
> >> easier to help with one problem at a time.
> >>
> >> Regards,
> >>Alex.
> >> Personal blog: http://blog.outerthoughts.com/
> >> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> >> - Time is the quality of nature that keeps events from happening all
> >> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> >> book)
> >>
> >>
> >> On Fri, May 31, 2013 at 11:03 AM, Mysurf Mail 
> >> wrote:
> >> > I am trying to config solr over ms sql server.
> >> > I found only this tutorial
> >> > <http://www.chrisumbel.com/article/lucene_solr_sql_server>whih
> >> > is a bit old (2011)
> >> > Is there an updated  / formal tutorial?
> >>
>


Re: installing & configuring solr over ms sql server - tutorial needed

2013-05-31 Thread Mysurf Mail
btw: The other stages still refer to location relative to tomcat


On Sat, Jun 1, 2013 at 12:02 AM, Mysurf Mail  wrote:

> for instance step 5 - Download and install a SQL Server JDBC drive.
> Where do I put it when using jetty?
>
> * Just asked here a question if an official  tutorial for ms sql server
> exists before I try to go through several tutorials.
>
>
>
> On Fri, May 31, 2013 at 6:42 PM, Alexandre Rafalovitch  > wrote:
>
>> What's wrong with the one you found. Just ignore steps 1-4 and go
>> right into driver and DIH setup. If you hit any problems, you now have
>> a specific question to ask.
>>
>> Regards,
>>Alex.
>> Personal blog: http://blog.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> - Time is the quality of nature that keeps events from happening all
>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>> book)
>>
>>
>> On Fri, May 31, 2013 at 11:29 AM, Mysurf Mail 
>> wrote:
>> > Thanks.
>> > A tutorial on getting solr over mssql ?
>> > I didnt find it even with jetty
>> >
>> >
>> >
>> > On Fri, May 31, 2013 at 6:21 PM, Alexandre Rafalovitch
>> > wrote:
>> >
>> >> You have two mostly-separate issues here. Running Solr in Tomcat and
>> >> indexing MSSql server.
>> >>
>> >> Try just running a default embedded-Jetty example until you get data
>> >> import sorted out. Then, you can worry about Tomcat. And it would be
>> >> easier to help with one problem at a time.
>> >>
>> >> Regards,
>> >>Alex.
>> >> Personal blog: http://blog.outerthoughts.com/
>> >> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> >> - Time is the quality of nature that keeps events from happening all
>> >> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>> >> book)
>> >>
>> >>
>> >> On Fri, May 31, 2013 at 11:03 AM, Mysurf Mail 
>> >> wrote:
>> >> > I am trying to config solr over ms sql server.
>> >> > I found only this tutorial
>> >> > <http://www.chrisumbel.com/article/lucene_solr_sql_server>whih
>> >> > is a bit old (2011)
>> >> > Is there an updated  / formal tutorial?
>> >>
>>
>
>


Re: installing & configuring solr over ms sql server - tutorial needed

2013-05-31 Thread Mysurf Mail
Hi,
I am still having a problem with this
tutorial trying
to get solr on tomcat.
in step 4 when I copy apache-solr-1.4.0\example\solr to my tomcat dir I get
a folder with bin and collection1 folder.
Do I need them?
should I create conf under solr or under collection1?
I dont have any solrconfig or schema files under solr. only under
collection1.



On Sat, Jun 1, 2013 at 12:26 AM, bbarani  wrote:

> solrconfig.xml - the lib directives specified in the configuration file are
> the lib locations where Solr would look for the jars.
>
> solr.xml - In case of the Multi core setup, you can have a sharedLib for
> all
> the collections. You can add the jdbc driver into the sharedLib folder.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/installing-configuring-solr-over-ms-sql-server-tutorial-needed-tp4067344p4067465.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: installing & configuring solr over ms sql server - tutorial needed

2013-06-01 Thread Mysurf Mail
My problem was with sql server.
This  is a
great "step by step"


On Sat, Jun 1, 2013 at 2:06 AM, bbarani  wrote:

> Why dont you follow this one tutorial to set the SOLR on tomcat..
>
> http://wiki.apache.org/solr/SolrTomcat
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/installing-configuring-solr-over-ms-sql-server-tutorial-needed-tp4067344p4067488.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Estimating the required volume to

2013-06-01 Thread Mysurf Mail
Hi,

I am just starting to learn about solr.
I want to test it in my env working with ms sql server.

I have followed the tutorial and imported some rows to the Solr.
Now I have a few noob question regarding the benefits of implementing Solr
on a sql environment.

1. As I understand, When I send a query request over http, I receive a
result with ID from the Solr system and than I query the full object row
from the db.
Is that right?
Is there a comparison  next to ms sql full text search which retrieves the
full object in the same select?
Is there a comparison that relates to db/server cluster and multiple
machines?
2. Is there a technic that will assist me to estimate the volume size I
will need for the indexed data (obviously, based on the indexed data
properties) ?


Clearing a specific index / all indice

2013-06-02 Thread Mysurf Mail
I am running solr with two cores in solr.xml
One is product (import from db) and one is collection1 (from the tutorial)

Now in order to clear the index I run

http://localhost:8983/solr/update?stream.body=*:*

http://localhost:8983/solr/update?stream.body=


only the "collection1" core (of the tutorial) is cleared.

How can I clear a specific index?

How can I clear all indice?

Thanks.


word stem

2013-06-02 Thread Mysurf Mail
Using solr over my sql db I query the following

http://localhost:8983/solr/products/select?q=require&wt=xml&indent=true&fl=*,score

where the queried word "require" is found in the index since I imported the
following:

"Each frame is hand-crafted in our Bothell facility to the optimum diameter
and wall-thickness *required *of a premium mountain frame. The heat-treated
welded aluminum frame has a larger diameter tube that absorbs the bumps."

required!=require

I try it in the analysis tool in the portal for debugging and I see in the
fierld value the PST (stem) filter does make a token from the required as
requir
I write required in the debug query field and when I click on Analyse
Values I see requir is highlited.


But the http query only return values when I wuery required. not require.


Thanks.


Re: Estimating the required volume to

2013-06-03 Thread Mysurf Mail
Thanks for your answer.
Can you please elaborate on
"mssql text searching is pretty primitive compared to Solr"
(Link or anything)
Thanks.


On Sun, Jun 2, 2013 at 4:54 PM, Erick Erickson wrote:

> 1> Maybe, maybe not. mssql text searching is pretty primitive
> compared to Solr, just as Solr's db-like operations are
> primitive compared to mssql. They address different use-cases.
>
> So, you can store the docs in Solr and not touch your SQL db
> at all to return the docs. You can store just the IDs in Solr and
> retrieve your docs from the SQL store. You can store just
> enough data in Solr to display the results page and when the user
> tries to drill down you can go to your SQL database for assembling
> the full document. You can. It all depend on your use case, data
>size, all that rot.
>
>Very often, something like the DB is considered the system-of-record
>and it's indexed to Solr (See DIH or SolrJ) periodically.
>
>   There is no underlying connection between your SQL store and Solr.
>   You control when data is fetched from SQL and put into Solr. You
>control what the search experience is. etc.
>
> 2> Not really :(.  See:
>
> http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> Best
> Erick
>
> On Sat, Jun 1, 2013 at 1:07 PM, Mysurf Mail  wrote:
> > Hi,
> >
> > I am just starting to learn about solr.
> > I want to test it in my env working with ms sql server.
> >
> > I have followed the tutorial and imported some rows to the Solr.
> > Now I have a few noob question regarding the benefits of implementing
> Solr
> > on a sql environment.
> >
> > 1. As I understand, When I send a query request over http, I receive a
> > result with ID from the Solr system and than I query the full object row
> > from the db.
> > Is that right?
> > Is there a comparison  next to ms sql full text search which retrieves
> the
> > full object in the same select?
> > Is there a comparison that relates to db/server cluster and multiple
> > machines?
> > 2. Is there a technic that will assist me to estimate the volume size I
> > will need for the indexed data (obviously, based on the indexed data
> > properties) ?
>


Re: Estimating the required volume to

2013-06-03 Thread Mysurf Mail
Hi,
Thanks for your answer.
I want to refer to your message, because I am trying to choose the right
tool.


1. regarding stemming:
I am running in ms-sql

SELECT * FROM sys.dm_fts_parser ('FORMSOF(INFLECTIONAL,"provide")', 1033,
0, 0)

and I receive

group_id phrase_id occurrence special_term display_term expansion_type
source_term
1 0 1 Exact Match *provided *2 provide
1 0 1 Exact Match *provides  *2 provide
1 0 1 Exact Match *providing *2 provide
1 0 1 Exact Match *provide *0 provide

isnt that stemming ?
2. Regarding synonyms
sql server has a full thesaurus
feature<http://msdn.microsoft.com/en-us/library/ms142491.aspx>.
Doesnt it mean synonyms?


On Mon, Jun 3, 2013 at 2:43 PM, Erick Erickson wrote:

> Here's a link to various transformations you can do
> while indexing and searching in Solr:
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
> Consider
> > stemming
> > ngrams
> > WordDelimiterFilterFactory
> > ASCIIFoldingFilterFactory
> > phrase queries
> > boosting
> > synonyms
> > blah blah blah
>
> You can't do a lot of these transformations, at least not easily
> in SQL. OTOH, you can't do 5-way joins in Solr. Different problems,
> different tools
>
> All that said, there's no good reason to use Solr if your use-case
> is satisfied by simple keyword searches that have no transformations,
> mysql etc. work just fine in those cases. It's all about selecting the
> right tool for the use-case.
>
> FWIW,
> Erick
>
> On Mon, Jun 3, 2013 at 4:44 AM, Mysurf Mail  wrote:
> > Thanks for your answer.
> > Can you please elaborate on
> > "mssql text searching is pretty primitive compared to Solr"
> > (Link or anything)
> > Thanks.
> >
> >
> > On Sun, Jun 2, 2013 at 4:54 PM, Erick Erickson  >wrote:
> >
> >> 1> Maybe, maybe not. mssql text searching is pretty primitive
> >> compared to Solr, just as Solr's db-like operations are
> >> primitive compared to mssql. They address different use-cases.
> >>
> >> So, you can store the docs in Solr and not touch your SQL db
> >> at all to return the docs. You can store just the IDs in Solr and
> >> retrieve your docs from the SQL store. You can store just
> >> enough data in Solr to display the results page and when the user
> >> tries to drill down you can go to your SQL database for assembling
> >> the full document. You can. It all depend on your use case, data
> >>size, all that rot.
> >>
> >>Very often, something like the DB is considered the system-of-record
> >>and it's indexed to Solr (See DIH or SolrJ) periodically.
> >>
> >>   There is no underlying connection between your SQL store and Solr.
> >>   You control when data is fetched from SQL and put into Solr. You
> >>control what the search experience is. etc.
> >>
> >> 2> Not really :(.  See:
> >>
> >>
> http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> >>
> >> Best
> >> Erick
> >>
> >> On Sat, Jun 1, 2013 at 1:07 PM, Mysurf Mail 
> wrote:
> >> > Hi,
> >> >
> >> > I am just starting to learn about solr.
> >> > I want to test it in my env working with ms sql server.
> >> >
> >> > I have followed the tutorial and imported some rows to the Solr.
> >> > Now I have a few noob question regarding the benefits of implementing
> >> Solr
> >> > on a sql environment.
> >> >
> >> > 1. As I understand, When I send a query request over http, I receive a
> >> > result with ID from the Solr system and than I query the full object
> row
> >> > from the db.
> >> > Is that right?
> >> > Is there a comparison  next to ms sql full text search which retrieves
> >> the
> >> > full object in the same select?
> >> > Is there a comparison that relates to db/server cluster and multiple
> >> > machines?
> >> > 2. Is there a technic that will assist me to estimate the volume size
> I
> >> > will need for the indexed data (obviously, based on the indexed data
> >> > properties) ?
> >>
>


Need assistance in defining solr to process user generated query text

2013-06-17 Thread Mysurf Mail
Hi,
I have been reading solr wiki pages and configured solr successfully over
my flat table.
I have a few question though regarding the querying and parsing of user
generated text.

1. I have understood through this page that
I want to use dismax.
Through this page I can do it
using localparams

But I think the best way is to define this in my xml files.
Can I do this?

2.in this tutorial
(solr) the following query appears

http://localhost:8983/solr/#/collection1/query?q=video

When I want to query my fact table  I have to query using *video*.
just video retrieves nothing.
How can I query it using video only?
3. In this page
it says that
"Extended DisMax is already configured in the example configuration, with
the name edismax"
But I see it only in the /browse requestHandler
as follows:



 
   explicit
...

   edismax

Do I use it also when I use select in my url ?

4. In general, I want to transfer a user generated text to my url request
using the most standard rules (translate "",+,- signs to the q parameter
value).
What is the best way to



Thanks.


Re: Need assistance in defining solr to process user generated query text

2013-06-17 Thread Mysurf Mail
I have one fact table with a lot of string columns and a few GUIDs just for
retreival (Not for search)



On Mon, Jun 17, 2013 at 6:01 PM, Jack Krupansky wrote:

> It sounds like you have your text indexed in a "string" field (why the
> wildcards are needed), or that maybe you are using the "keyword" tokenizer
> rather than the standard tokenizer.
>
> What is your default or query fields for dismax/edismax? And what are the
> field types for those fields?
>
> -- Jack Krupansky
>
> -Original Message- From: Mysurf Mail
> Sent: Monday, June 17, 2013 10:51 AM
> To: solr-user@lucene.apache.org
> Subject: Need assistance in defining solr to process user generated query
> text
>
>
> Hi,
> I have been reading solr wiki pages and configured solr successfully over
> my flat table.
> I have a few question though regarding the querying and parsing of user
> generated text.
>
> 1. I have understood through this 
> <http://wiki.apache.org/solr/**DisMax<http://wiki.apache.org/solr/DisMax>>page
> that
>
> I want to use dismax.
>Through this 
> <http://wiki.apache.org/solr/**LocalParams<http://wiki.apache.org/solr/LocalParams>>page
> I can do it
>
> using localparams
>
>But I think the best way is to define this in my xml files.
>Can I do this?
>
> 2.in this 
> <http://lucene.apache.org/**solr/4_3_0/tutorial.html<http://lucene.apache.org/solr/4_3_0/tutorial.html>
> >**tutorial
>
> (solr) the following query appears
>
>
> http://localhost:8983/solr/#/**collection1/query?q=video<http://localhost:8983/solr/#/collection1/query?q=video>
>
>When I want to query my fact table  I have to query using *video*.
>just video retrieves nothing.
>How can I query it using video only?
> 3. In this 
> <http://wiki.apache.org/solr/**ExtendedDisMax#Configuration<http://wiki.apache.org/solr/ExtendedDisMax#Configuration>
> >**page
>
> it says that
> "Extended DisMax is already configured in the example configuration, with
> the name edismax"
> But I see it only in the /browse requestHandler
> as follows:
>
>
> 
> 
>   explicit
>...
>
>   edismax
>
> Do I use it also when I use select in my url ?
>
> 4. In general, I want to transfer a user generated text to my url request
> using the most standard rules (translate "",+,- signs to the q parameter
> value).
> What is the best way to
>
>
>
> Thanks.
>


How to define my data in schema.xml

2013-06-17 Thread Mysurf Mail
Hi,
I have created a flat table from my DB and defined a solr core on it.
It works excellent so far.

My problem is that my table has two hierarchies. So when flatted it is too
big.
Lets consider the following example scenario

My Tables are

School
Students (1:n with school)
Teachers(1:n with school)

Now, each school has many students and teachers but each student/teacher
has another multivalue field. i.e. the following table

studentHobbies - 1:N with students
teacherCourses - 1:N with teachers

My main Entity is School and that what I want to get in the result.
Flattening does not help me much and is very expensive.

Can you direct me to how I define 1:n relationships  ( and 1:n:n)
In data-config.xml
Thanks.


Is there a way to encrypt username and pass in the solr config file

2013-06-17 Thread Mysurf Mail
Hi,
I want to encrypt (rsa maybe?) my user name/pass in solr .
Cant leave a simple plain text on the server.
What is the recomended way?
Thanks.


Solr data files

2013-06-17 Thread Mysurf Mail
Where are the core data files located?
Can I just delete folder/files in order to quick clean the core/indexes?
Thanks


Re: How to define my data in schema.xml

2013-06-17 Thread Mysurf Mail
Thanks for your quick reply. Here are some notes:

1. Consider that all tables in my example have two columns: Name &
Description which I would like to index and search.
2. I have no other reason to create flat table other than for solar. So I
would like to see if I can avoid it.
3. If in my example I will have a flat table then obviously it will hold a
lot of rows for a single school.
By searching the exact school name I will likely receive a lot of rows.
(my flat table has its own pk)
That is something I would like to avoid and I thought I can avoid this
by defining teachers and students as multiple value or something like this
and than teacherCourses and studentHobbies  as 1:n respectively.
This is quite similiar to my real life demand, so I came here to get
some tips as a solr noob.


On Mon, Jun 17, 2013 at 9:08 PM, Gora Mohanty  wrote:

> On 17 June 2013 21:39, Mysurf Mail  wrote:
> > Hi,
> > I have created a flat table from my DB and defined a solr core on it.
> > It works excellent so far.
> >
> > My problem is that my table has two hierarchies. So when flatted it is
> too
> > big.
>
> What do you mean by "too big"? Have you actually tried
> indexing the data into Solr, and does the performance
> not meet your needs, or are you guessing from the size
> of the tables?
>
> > Lets consider the following example scenario
> >
> > My Tables are
> >
> > School
> > Students (1:n with school)
> > Teachers(1:n with school)
> [...]
>
> Um, all of this crucially depends on what your 'n' is.
> Plus, you need to describe your use case in much
> more detail. At the moment, you are asking us to
> guess at what you are trying to do, which is inefficient,
> and unlikely to solve your problem.
>
> Regards,
> Gora
>


Re: Solr data files

2013-06-17 Thread Mysurf Mail
Thanks.,


On Mon, Jun 17, 2013 at 10:42 PM, Alexandre Rafalovitch
wrote:

> The index files are under the the collection's directory in the
> subdirectory called 'data'. Right next to the directory called 'conf'
> where your schema.xml and solrconfig.xml live.
>
> If the Solr is not running, you can delete that directory to clear the
> index content. I don't think you can do that while Solr is running.
>
> Regards,
>Alex.
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Mon, Jun 17, 2013 at 3:33 PM, Mysurf Mail 
> wrote:
> > Where are the core data files located?
> > Can I just delete folder/files in order to quick clean the core/indexes?
> > Thanks
>


Re: How to define my data in schema.xml

2013-06-17 Thread Mysurf Mail
Thanks for your reply.
I have tried the simplest approach and it works absolutely fantastic.
Huge table - 0s to result.

two problems as I described earlier, and that is what I try to solve:
1. I create a flat table just for solar. This requires maintenance and
develop. Can I run solr over my regular tables?
This is my simplest approach. Working over my relational tables,
2. When you query a flat table by school name, as I described, if the
school has 300 student, 300 teachers, 300  with 300 teacherCourses, 300
studentHobbies,
you get 8.1 Billion rows (300*300*300*300). As I am sure this will work
great on solar - searching for the school name will retrieve 8.1 B rows.
3. Lets say all my searches are user generated free text search that is
searching name and comments columns.
Thanks.


On Tue, Jun 18, 2013 at 7:32 AM, Gora Mohanty  wrote:

> On 18 June 2013 01:10, Mysurf Mail  wrote:
> > Thanks for your quick reply. Here are some notes:
> >
> > 1. Consider that all tables in my example have two columns: Name &
> > Description which I would like to index and search.
> > 2. I have no other reason to create flat table other than for solar. So I
> > would like to see if I can avoid it.
> > 3. If in my example I will have a flat table then obviously it will hold
> a
> > lot of rows for a single school.
> > By searching the exact school name I will likely receive a lot of
> rows.
> > (my flat table has its own pk)
>
> Yes, all of this is definitely the case, but in practice
> it does not matter. Solr can efficiently search through
> millions of rows. To start with, just try the simplest
> approach, and only complicate things as and when
> needed.
>
> > That is something I would like to avoid and I thought I can avoid
> this
> > by defining teachers and students as multiple value or something like
> this
> > and than teacherCourses and studentHobbies  as 1:n respectively.
> > This is quite similiar to my real life demand, so I came here to get
> > some tips as a solr noob.
>
> You have still not described what are the searches that
> you would want to do. Again, I would suggest starting
> with the most straightforward approach.
>
> Regards,
> Gora
>


implementing identity authentication in SOLR

2013-06-18 Thread Mysurf Mail
Hi,
In order to add solr to my prod environmnet I have to implement some
security restriction.
Is there a way to add user/pass to the requests and to keep them
*encrypted*in a file.
thanks.


Re: implementing identity authentication in SOLR

2013-06-18 Thread Mysurf Mail
Just to make sure.
In my previous question I was referring to the user/pass that queries the
db.

Now I was referring to the user/pass that i want for the solr http request.
Think of it as if my user sends a request where he filter documents created
by another user.
I want to restrict that.

I currently work in a .NET environment where we have identity provider that
provides trusted claims to the http request.
In similar situations I take the user name property from a trusted claim
and not from a parameter in the url .

I want to know how solr can restrict his http request/responses.
Thank you.


On Tue, Jun 18, 2013 at 10:56 AM, Gora Mohanty  wrote:

> On 18 June 2013 13:10, Mysurf Mail  wrote:
> > Hi,
> > In order to add solr to my prod environmnet I have to implement some
> > security restriction.
> > Is there a way to add user/pass to the requests and to keep them
> > *encrypted*in a file.
>
> As mentioned earlier, no there is no built-in way of doing that
> if you are using the Solr DataImportHandler.
>
> Probably the easiest way would be to implement your own
> indexing using a library like SolrJ. Then, you can handle encryption
> as you wish.
>
> Regards,
> Gora
>


Re: Need assistance in defining solr to process user generated query text

2013-06-18 Thread Mysurf Mail
great tip :-)


On Tue, Jun 18, 2013 at 2:36 PM, Erick Erickson wrote:

> if the _solr_ type is "string", then you aren't getting any
> tokenization, so "my dog has fleas" is indexed as
> "my dog has fleas", a single token. To search
> for individual words you need to use, say, the
> "text_general" type, which would index
> "my" "dog" "has" "fleas"
>
> Best
> Erick
>
> On Mon, Jun 17, 2013 at 11:26 AM, Mysurf Mail 
> wrote:
> > I have one fact table with a lot of string columns and a few GUIDs just
> for
> > retreival (Not for search)
> >
> >
> >
> > On Mon, Jun 17, 2013 at 6:01 PM, Jack Krupansky  >wrote:
> >
> >> It sounds like you have your text indexed in a "string" field (why the
> >> wildcards are needed), or that maybe you are using the "keyword"
> tokenizer
> >> rather than the standard tokenizer.
> >>
> >> What is your default or query fields for dismax/edismax? And what are
> the
> >> field types for those fields?
> >>
> >> -- Jack Krupansky
> >>
> >> -Original Message- From: Mysurf Mail
> >> Sent: Monday, June 17, 2013 10:51 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Need assistance in defining solr to process user generated
> query
> >> text
> >>
> >>
> >> Hi,
> >> I have been reading solr wiki pages and configured solr successfully
> over
> >> my flat table.
> >> I have a few question though regarding the querying and parsing of user
> >> generated text.
> >>
> >> 1. I have understood through this <http://wiki.apache.org/solr/**DisMax
> <http://wiki.apache.org/solr/DisMax>>page
> >> that
> >>
> >> I want to use dismax.
> >>Through this <http://wiki.apache.org/solr/**LocalParams<
> http://wiki.apache.org/solr/LocalParams>>page
> >> I can do it
> >>
> >> using localparams
> >>
> >>But I think the best way is to define this in my xml files.
> >>Can I do this?
> >>
> >> 2.in this <http://lucene.apache.org/**solr/4_3_0/tutorial.html<
> http://lucene.apache.org/solr/4_3_0/tutorial.html>
> >> >**tutorial
> >>
> >> (solr) the following query appears
> >>
> >>http://localhost:8983/solr/#/**collection1/query?q=video<
> http://localhost:8983/solr/#/collection1/query?q=video>
> >>
> >>When I want to query my fact table  I have to query using *video*.
> >>just video retrieves nothing.
> >>How can I query it using video only?
> >> 3. In this <http://wiki.apache.org/solr/**ExtendedDisMax#Configuration<
> http://wiki.apache.org/solr/ExtendedDisMax#Configuration>
> >> >**page
> >>
> >> it says that
> >> "Extended DisMax is already configured in the example configuration,
> with
> >> the name edismax"
> >> But I see it only in the /browse requestHandler
> >> as follows:
> >>
> >>
> >> 
> >> 
> >>   explicit
> >>...
> >>
> >>   edismax
> >>
> >> Do I use it also when I use select in my url ?
> >>
> >> 4. In general, I want to transfer a user generated text to my url
> request
> >> using the most standard rules (translate "",+,- signs to the q parameter
> >> value).
> >> What is the best way to
> >>
> >>
> >>
> >> Thanks.
> >>
>


Re: Is there a way to encrypt username and pass in the solr config file

2013-06-18 Thread Mysurf Mail
@Gora: yes.
User name and pass.


On Tue, Jun 18, 2013 at 2:57 PM, Gora Mohanty  wrote:

> On 18 June 2013 17:16, Erick Erickson  wrote:
> > What do you mean "encrypt"? The stored value?
> > the indexed value? Over the wire?
> [...]
>
> My understanding was that he wanted to encrypt the
> username/password in the DIH configuration file.
> "Mysurf Mail", could you please clarify?
>
> Regards,
> Gora
>


Re: How to define my data in schema.xml

2013-06-18 Thread Mysurf Mail
Hi Jack,
Thanks, for you kind comment.

I am truly in the beginning of data modeling my schema over an existing
working DB.
I have used the school-teachers-student db as an example scenario.
(a, I have written it as a disclaimer in my first post. b. I really do not
know anyone that has 300 hobbies too.)

In real life my db is obviously much different,
I just used this as an example of potential pitfalls that will occur if I
use my old db data modeling notions.
obviously, the old relational modeling idioms do not apply here.

Now, my question was referring to the fact that I would really like to
avoid a flat table/join/view because of the reason listed above.
So, my scenario is answering a plain user generated text search over a
MSSQLDB that contains a few 1:n relation (and a few 1:n:n relationship).

So, I come here for tips. Should I use one combined index (treat it as a
nosql source) or separate indices or another. any other ways to define
relation data ?
Thanks.



On Tue, Jun 18, 2013 at 4:30 PM, Jack Krupansky wrote:

> It sounds like you still have a lot of work to do on your data model. No
> matter how you slice it, 8 billion rows/fields/whatever is still way too
> much for any engine to search on a single server. If you have 8 billion of
> anything, a heavily sharded SolrCloud cluster is probably warranted. Don't
> plan ahead to put more than 100 million rows on a single node; plan on a
> proof of concept implementation to determine that number.
>
> When we in Solr land say "flattened" or "denormalized", we mean in an
> intelligent, "smart", thoughtful sense, not a mindless, mechanical
> flattening. It is an opportunity for you to reconsider your data models,
> both old and new.
>
> Maybe data modeling is beyond your skill set. If so, have a chat with your
> boss and ask for some assistance, training, whatever.
>
> Actually, I am suspicious of your 8 billion number - change each of those
> 300's to realistic, average numbers. Each teacher teaches 300 courses?
> Right. Each Student has 300 hobbies? If you say so, but...
>
> Don't worry about schema.xml until you get your data model under control.
>
> For an initial focus, try envisioning the use cases for user queries. That
> will guide you in thinking about how the data would need to be organized to
> satisfy those user queries.
>
> -- Jack Krupansky
>
> -Original Message- From: Mysurf Mail
> Sent: Tuesday, June 18, 2013 2:20 AM
> To: solr-user@lucene.apache.org
> Subject: Re: How to define my data in schema.xml
>
>
> Thanks for your reply.
> I have tried the simplest approach and it works absolutely fantastic.
> Huge table - 0s to result.
>
> two problems as I described earlier, and that is what I try to solve:
> 1. I create a flat table just for solar. This requires maintenance and
> develop. Can I run solr over my regular tables?
>This is my simplest approach. Working over my relational tables,
> 2. When you query a flat table by school name, as I described, if the
> school has 300 student, 300 teachers, 300  with 300 teacherCourses, 300
> studentHobbies,
>you get 8.1 Billion rows (300*300*300*300). As I am sure this will work
> great on solar - searching for the school name will retrieve 8.1 B rows.
> 3. Lets say all my searches are user generated free text search that is
> searching name and comments columns.
> Thanks.
>
>
> On Tue, Jun 18, 2013 at 7:32 AM, Gora Mohanty  wrote:
>
>  On 18 June 2013 01:10, Mysurf Mail  wrote:
>> > Thanks for your quick reply. Here are some notes:
>> >
>> > 1. Consider that all tables in my example have two columns: Name &
>> > Description which I would like to index and search.
>> > 2. I have no other reason to create flat table other than for solar. So
>> > I
>> > would like to see if I can avoid it.
>> > 3. If in my example I will have a flat table then obviously it will hold
>> a
>> > lot of rows for a single school.
>> > By searching the exact school name I will likely receive a lot of
>> rows.
>> > (my flat table has its own pk)
>>
>> Yes, all of this is definitely the case, but in practice
>> it does not matter. Solr can efficiently search through
>> millions of rows. To start with, just try the simplest
>> approach, and only complicate things as and when
>> needed.
>>
>> > That is something I would like to avoid and I thought I can avoid
>> this
>> > by defining teachers and students as multiple value or something like
>> this
>> > and than teacherCourses and studentHobbies  as 1:n respectively.
>> > This is quite similiar to my real life demand, so I came here to get
>> > some tips as a solr noob.
>>
>> You have still not described what are the searches that
>> you would want to do. Again, I would suggest starting
>> with the most straightforward approach.
>>
>> Regards,
>> Gora
>>
>>
>


Re: How to define my data in schema.xml

2013-06-19 Thread Mysurf Mail
Well,
Avoiding flattening the db to a flat table sounds like a great plan.
I found this solution
http://wiki.apache.org/solr/DataImportHandler#Full_Import_Example

import.a join. not handling a flat table.



On Tue, Jun 18, 2013 at 5:53 PM, Jack Krupansky wrote:

> You can in fact have multiple collections in Solr and do a limited amount
> of joining, and Solr has multivalued fields as well, but none of those
> techniques should be used to avoid the process of flattening and
> denormalizing a relational data model. It is hard work, but yes, it is
> required to use Solr effectively.
>
> Again, start with the queries - what problem are you trying to solve.
> Nobody stores data just for the sake of storing it - how will the data be
> used?
>
>
> -- Jack Krupansky
>
> -----Original Message- From: Mysurf Mail
> Sent: Tuesday, June 18, 2013 9:58 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: How to define my data in schema.xml
>
> Hi Jack,
> Thanks, for you kind comment.
>
> I am truly in the beginning of data modeling my schema over an existing
> working DB.
> I have used the school-teachers-student db as an example scenario.
> (a, I have written it as a disclaimer in my first post. b. I really do not
> know anyone that has 300 hobbies too.)
>
> In real life my db is obviously much different,
> I just used this as an example of potential pitfalls that will occur if I
> use my old db data modeling notions.
> obviously, the old relational modeling idioms do not apply here.
>
> Now, my question was referring to the fact that I would really like to
> avoid a flat table/join/view because of the reason listed above.
> So, my scenario is answering a plain user generated text search over a
> MSSQLDB that contains a few 1:n relation (and a few 1:n:n relationship).
>
> So, I come here for tips. Should I use one combined index (treat it as a
> nosql source) or separate indices or another. any other ways to define
> relation data ?
> Thanks.
>
>
>
> On Tue, Jun 18, 2013 at 4:30 PM, Jack Krupansky *
> *wrote:
>
>  It sounds like you still have a lot of work to do on your data model. No
>> matter how you slice it, 8 billion rows/fields/whatever is still way too
>> much for any engine to search on a single server. If you have 8 billion of
>> anything, a heavily sharded SolrCloud cluster is probably warranted. Don't
>> plan ahead to put more than 100 million rows on a single node; plan on a
>> proof of concept implementation to determine that number.
>>
>> When we in Solr land say "flattened" or "denormalized", we mean in an
>> intelligent, "smart", thoughtful sense, not a mindless, mechanical
>> flattening. It is an opportunity for you to reconsider your data models,
>> both old and new.
>>
>> Maybe data modeling is beyond your skill set. If so, have a chat with your
>> boss and ask for some assistance, training, whatever.
>>
>> Actually, I am suspicious of your 8 billion number - change each of those
>> 300's to realistic, average numbers. Each teacher teaches 300 courses?
>> Right. Each Student has 300 hobbies? If you say so, but...
>>
>> Don't worry about schema.xml until you get your data model under control.
>>
>> For an initial focus, try envisioning the use cases for user queries. That
>> will guide you in thinking about how the data would need to be organized
>> to
>> satisfy those user queries.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Mysurf Mail
>> Sent: Tuesday, June 18, 2013 2:20 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: How to define my data in schema.xml
>>
>>
>> Thanks for your reply.
>> I have tried the simplest approach and it works absolutely fantastic.
>> Huge table - 0s to result.
>>
>> two problems as I described earlier, and that is what I try to solve:
>> 1. I create a flat table just for solar. This requires maintenance and
>> develop. Can I run solr over my regular tables?
>>This is my simplest approach. Working over my relational tables,
>> 2. When you query a flat table by school name, as I described, if the
>> school has 300 student, 300 teachers, 300  with 300 teacherCourses, 300
>> studentHobbies,
>>you get 8.1 Billion rows (300*300*300*300). As I am sure this will work
>> great on solar - searching for the school name will retrieve 8.1 B rows.
>> 3. Lets say all my searches are user generated free text search that is
>> searching name and comments columns.
>> Thanks.
>>
>>
>> On Tue, Jun 18, 2013 at 7:32 AM, Gora Mohanty  wrote:
>>
>

modeling multiple values on 1:n connection

2013-06-22 Thread Mysurf Mail
I try to model my db using
thisexample
from solr wiki.

I have a table called item and a table called features with
id,featureName,description

here is the updated xml (added featureName)










Now I get two lists in the xml element



number of miles in every direction the universal cataclysm was
gathering All around the Restaurant people and things relaxed
and chatted. The - Do we have... - he put up a hand to hold back
the cheers, - Do we  
to a stupefying climax. Glancing at his watch, Max returned to the
stage air was filled with talk of this and that, and with the
mingled scents of have a party here from the Zansellquasure
Flamarion Bridge Club from
.

But I would like to see the list together (using xml attributes) so that I
dont have to join the values.
Is it possible?


Re: modeling multiple values on 1:n connection

2013-06-23 Thread Mysurf Mail
Thanks for your comment.
What I need is to model it so that i can connect between the featureName
and the feature description of the.
Currently if item has 3 features I get two list - each three elements long.
But then I need to correlate them.



On Sun, Jun 23, 2013 at 9:25 AM, Gora Mohanty  wrote:

> On 23 June 2013 01:31, Mysurf Mail  wrote:
> > I try to model my db using
> > this<http://wiki.apache.org/solr/DataImportHandler#Full_Import_Example
> >example
> > from solr wiki.
> >
> > I have a table called item and a table called features with
> > id,featureName,description
> >
> > here is the updated xml (added featureName)
> >
> > 
> >  > url="jdbc:hsqldb:/temp/example/ex" user="sa" />
> > 
> > 
> > 
> > 
> > 
> >
> >
> > Now I get two lists in the xml element
> >
> > 
> > 
> > number of miles in every direction the universal cataclysm was
> > gathering All around the Restaurant people and things relaxed
> > and chatted. The - Do we have... - he put up a hand to hold
> back
> > the cheers, - Do we  
> > to a stupefying climax. Glancing at his watch, Max returned to the
> > stage air was filled with talk of this and that, and with the
> > mingled scents of have a party here from the Zansellquasure
> > Flamarion Bridge Club from
> > .
> > 
> > But I would like to see the list together (using xml attributes) so that
> I
> > dont have to join the values.
> > Is it possible?
>
> While it is not clear to me what you are asking, I am
> guessing that you do not want the featureName and
> description fields to appear as arrays. This is happening
> because you have defined them as multi-valued in the
> Solr schema. What exactly do you want to "join" here?
>
> Regards,
> Gora
>


Need assistance in defining search urls

2013-06-24 Thread Mysurf Mail
Now, each doc looks like this (i generated random user text in the freetext
columns in the DB)
 We have located the ship.  d1771fc0-d3c2-472d-aa33-4bf5d1b79992 b2986a4f-9687-404c-8d45-57b073d900f7 
a99cf760-d78e-493f-a827-585d11a765f3 
ba349832-c655-4a02-a552-d5b76b45d58c 
35e86a61-eba8-49f4-95af-8915bd9561ac 
6d8eb7d9-b417-4bda-b544-16bc26ab1d85 
31453eff-be19-4193-950f-fffcea70ef9e 
08e27e4f-3d07-4ede-a01d-4fdea3f7ddb0 
79a19a3f-3f1b-486f-9a84-3fb40c41e9c7 
b34c6f78-75b1-42f1-8ec7-e03d874497df  
1.7437795 
My searches are :
(PackageName is deined as default search)

1. I try to search for any package that name has the word "have" or "had"
or "has"
2. I try to search for any package that consists
d1771fc0-d3c2-472d-aa33-4bf5d1b79992

Therefore I use this searches

1.
http://localhost:8983/solr/vault/select?q=*have*&fl=PackageName%2Cscore&defType=edismax&stopwords=true&lowercaseOperators=true

questions :
1.a. even if i display all results, I dont get any results with "has "
(inflections). Why?
1.b. what is the difference between
*have*
 and 
have.
the score is differnt.

2.
http://localhost:8983/solr/vault/select?q=*:d1771fc0-d3c2-472d-aa33-4bf5d1b79992&fl=PackageName,score&defType=edismax&stopwords=true&lowercaseOperators=true&start=0&rows=300

Questions:
2.a. I get no result. even though i search it on all fields. (*) and it
appears in
2.b. If I want to search on more than one field i.e. packageName &
description, what is the best way to do it?
define all as default?
Thanks,


What should be the definitions ( field type ) for a field that will be search with user free text

2013-06-24 Thread Mysurf Mail
currently I am using text_general.
I want to search with user free text search, therefor I would like
tokenization, stemmings ...
How do I define stemmers?
Should I use text_en instead of  text_general?
Thank you.


Re: Need assistance in defining search urls

2013-06-24 Thread Mysurf Mail
Thanks Jack and Giovanni.
Jack:
Regarding 1.b. have vs *have* the results were identical apart from the
score.
Basically i cant do all the stuff you recommended. I want a stemmer for an
unknown search (send the query when user enters free text to a textbox ).

giovanni-  regarding requestHandler  test
will I need to query using /test/...?
shouldnt it be names "/test"?

.



On Mon, Jun 24, 2013 at 4:40 PM, Jack Krupansky wrote:

> "I don't get any results with "has "(inflections). Why?"
>
> Wildcard patterns on strings are literal, exact. There is no automatic
> natural language processing.
>
> You could try a regular expression match:
>
> q=/ ha(s|ve) /
>
> Or, just use OR:
>
> q=*has* OR *have*
>
> Or, use a  of the package name to a text field and than you can
> use simple keywords:
>
> q=package_name_text:(has OR have)
>
> Is PackageName a "string" field?
>
> Or, maybe best, use an update processor to populate a Boolean field to
> indicate whether the has/have pattern is seen in the package name. A simple
> JavaScript script with a StatelessScriptUpdateProcessor could do this in
> just a couple of lines and make the query much faster.
>
> For question 1.b the two queries seem identical - was that the case?
>
> There is no "*:" feature to query all fields in Solr - although the
> LucidWorks Search query parser does support that feature.
>
> -- Jack Krupansky
>
> -Original Message- From: Mysurf Mail
> Sent: Monday, June 24, 2013 7:26 AM
> To: solr-user@lucene.apache.org
> Subject: Need assistance in defining search urls
>
>
> Now, each doc looks like this (i generated random user text in the freetext
> columns in the DB)
>  We have located the ship.  "CatalogVendorPartNum"> d1771fc0-d3c2-472d-aa33-**4bf5d1b79992
> 
>> b2986a4f-9687-404c-8d45-**57b073d900f7 
>>
> a99cf760-d78e-493f-a827-**585d11a765f3 
> ba349832-c655-4a02-a552-**d5b76b45d58c 
> 35e86a61-eba8-49f4-95af-**8915bd9561ac 
> 6d8eb7d9-b417-4bda-b544-**16bc26ab1d85 
> 31453eff-be19-4193-950f-**fffcea70ef9e 
> 08e27e4f-3d07-4ede-a01d-**4fdea3f7ddb0 
> 79a19a3f-3f1b-486f-9a84-**3fb40c41e9c7 
> b34c6f78-75b1-42f1-8ec7-**e03d874497df  
> 1.7437795 
> My searches are :
> (PackageName is deined as default search)
>
> 1. I try to search for any package that name has the word "have" or "had"
> or "has"
> 2. I try to search for any package that consists
> d1771fc0-d3c2-472d-aa33-**4bf5d1b79992
>
> Therefore I use this searches
>
> 1.
> http://localhost:8983/solr/**vault/select?q=*have*&fl=**
> PackageName%2Cscore&defType=**edismax&stopwords=true&**
> lowercaseOperators=true<http://localhost:8983/solr/vault/select?q=*have*&fl=PackageName%2Cscore&defType=edismax&stopwords=true&lowercaseOperators=true>
>
> questions :
> 1.a. even if i display all results, I dont get any results with "has "
> (inflections). Why?
> 1.b. what is the difference between
> *have*<http://localhost:8983/**solr/vault/select?q=*have*&fl=**
> PackageName%2Cscore&defType=**edismax&stopwords=true&**
> lowercaseOperators=true<http://localhost:8983/solr/vault/select?q=*have*&fl=PackageName%2Cscore&defType=edismax&stopwords=true&lowercaseOperators=true>
> >
> and have<http://localhost:8983/**solr/vault/select?q=*have*&fl=**
> PackageName%2Cscore&defType=**edismax&stopwords=true&**
> lowercaseOperators=true<http://localhost:8983/solr/vault/select?q=*have*&fl=PackageName%2Cscore&defType=edismax&stopwords=true&lowercaseOperators=true>
> >.
> the score is differnt.
>
> 2.
> http://localhost:8983/solr/**vault/select?q=*:d1771fc0-**
> d3c2-472d-aa33-4bf5d1b79992&**fl=PackageName,score&defType=**
> edismax&stopwords=true&**lowercaseOperators=true&start=**0&rows=300<http://localhost:8983/solr/vault/select?q=*:d1771fc0-d3c2-472d-aa33-4bf5d1b79992&fl=PackageName,score&defType=edismax&stopwords=true&lowercaseOperators=true&start=0&rows=300>
>
> Questions:
> 2.a. I get no result. even though i search it on all fields. (*) and it
> appears in
> 2.b. If I want to search on more than one field i.e. packageName &
> description, what is the best way to do it?
> define all as default?
> Thanks,
>


Re: Need assistance in defining search urls

2013-06-24 Thread Mysurf Mail
Regarding
"There is no "*:" feature to query all fields in Solr"

When I enter the dashboard -> solr/#/[core]/query
the default is "*:*"
and it brings everything.


On Mon, Jun 24, 2013 at 5:41 PM, Mysurf Mail  wrote:

> Thanks Jack and Giovanni.
> Jack:
> Regarding 1.b. have vs *have* the results were identical apart from the
> score.
> Basically i cant do all the stuff you recommended. I want a stemmer for an
> unknown search (send the query when user enters free text to a textbox ).
>
> giovanni-  regarding requestHandler  test
> will I need to query using /test/...?
> shouldnt it be names "/test"?
>
> .
>
>
>
> On Mon, Jun 24, 2013 at 4:40 PM, Jack Krupansky 
> wrote:
>
>> "I don't get any results with "has "(inflections). Why?"
>>
>> Wildcard patterns on strings are literal, exact. There is no automatic
>> natural language processing.
>>
>> You could try a regular expression match:
>>
>> q=/ ha(s|ve) /
>>
>> Or, just use OR:
>>
>> q=*has* OR *have*
>>
>> Or, use a  of the package name to a text field and than you
>> can use simple keywords:
>>
>> q=package_name_text:(has OR have)
>>
>> Is PackageName a "string" field?
>>
>> Or, maybe best, use an update processor to populate a Boolean field to
>> indicate whether the has/have pattern is seen in the package name. A simple
>> JavaScript script with a StatelessScriptUpdateProcessor could do this in
>> just a couple of lines and make the query much faster.
>>
>> For question 1.b the two queries seem identical - was that the case?
>>
>> There is no "*:" feature to query all fields in Solr - although the
>> LucidWorks Search query parser does support that feature.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Mysurf Mail
>> Sent: Monday, June 24, 2013 7:26 AM
>> To: solr-user@lucene.apache.org
>> Subject: Need assistance in defining search urls
>>
>>
>> Now, each doc looks like this (i generated random user text in the
>> freetext
>> columns in the DB)
>>  We have located the ship. > "CatalogVendorPartNum"> d1771fc0-d3c2-472d-aa33-**4bf5d1b79992
>> >
>>> b2986a4f-9687-404c-8d45-**57b073d900f7 
>>>
>> a99cf760-d78e-493f-a827-**585d11a765f3 
>> ba349832-c655-4a02-a552-**d5b76b45d58c 
>> 35e86a61-eba8-49f4-95af-**8915bd9561ac 
>> 6d8eb7d9-b417-4bda-b544-**16bc26ab1d85 
>> 31453eff-be19-4193-950f-**fffcea70ef9e 
>> 08e27e4f-3d07-4ede-a01d-**4fdea3f7ddb0 
>> 79a19a3f-3f1b-486f-9a84-**3fb40c41e9c7 
>> b34c6f78-75b1-42f1-8ec7-**e03d874497df  
>> 1.7437795 
>> My searches are :
>> (PackageName is deined as default search)
>>
>> 1. I try to search for any package that name has the word "have" or "had"
>> or "has"
>> 2. I try to search for any package that consists
>> d1771fc0-d3c2-472d-aa33-**4bf5d1b79992
>>
>> Therefore I use this searches
>>
>> 1.
>> http://localhost:8983/solr/**vault/select?q=*have*&fl=**
>> PackageName%2Cscore&defType=**edismax&stopwords=true&**
>> lowercaseOperators=true<http://localhost:8983/solr/vault/select?q=*have*&fl=PackageName%2Cscore&defType=edismax&stopwords=true&lowercaseOperators=true>
>>
>> questions :
>> 1.a. even if i display all results, I dont get any results with "has "
>> (inflections). Why?
>> 1.b. what is the difference between
>> *have*<http://localhost:8983/**solr/vault/select?q=*have*&fl=**
>> PackageName%2Cscore&defType=**edismax&stopwords=true&**
>> lowercaseOperators=true<http://localhost:8983/solr/vault/select?q=*have*&fl=PackageName%2Cscore&defType=edismax&stopwords=true&lowercaseOperators=true>
>> >
>> and have<http://localhost:8983/**solr/vault/select?q=*have*&fl=**
>> PackageName%2Cscore&defType=**edismax&stopwords=true&**
>> lowercaseOperators=true<http://localhost:8983/solr/vault/select?q=*have*&fl=PackageName%2Cscore&defType=edismax&stopwords=true&lowercaseOperators=true>
>> >.
>>  the score is differnt.
>>
>> 2.
>> http://localhost:8983/solr/**vault/select?q=*:d1771fc0-**
>> d3c2-472d-aa33-4bf5d1b79992&**fl=PackageName,score&defType=**
>> edismax&stopwords=true&**lowercaseOperators=true&start=**0&rows=300<http://localhost:8983/solr/vault/select?q=*:d1771fc0-d3c2-472d-aa33-4bf5d1b79992&fl=PackageName,score&defType=edismax&stopwords=true&lowercaseOperators=true&start=0&rows=300>
>>
>> Questions:
>> 2.a. I get no result. even though i search it on all fields. (*) and it
>> appears in
>> 2.b. If I want to search on more than one field i.e. packageName &
>> description, what is the best way to do it?
>> define all as default?
>> Thanks,
>>
>
>


why does the has to be indexed.

2013-06-24 Thread Mysurf Mail
Currently, I cant describe my unique key with indexed false.

As I understand from the docs the field attribute  should be true
only if i want the field to be searchable or sortable.

Let's say I have a schema with id and name only, wouldn't I want the
following configuration
id - indexed "false", stored = true
name indexed "true", stored = true

I don't want the id to be searched but I would want it to be defined as the
unique key and to be stored (for retrieval).


Re: What should be the definitions ( field type ) for a field that will be search with user free text

2013-06-24 Thread Mysurf Mail
Thanks.


On Mon, Jun 24, 2013 at 5:52 PM, Jack Krupansky wrote:

> The general idea is that tokenization can generally be done in a
> language-independent manner, but stemming, synonyms, stop words, etc. must
> be done in a language-dependent manner.
>
> So, yes, text_en is a better starting point for adding in the more
> advanced language processing features.
>
> -- Jack Krupansky
>
> -----Original Message- From: Mysurf Mail
> Sent: Monday, June 24, 2013 10:26 AM
> To: solr-user@lucene.apache.org
> Subject: What should be the definitions ( field type ) for a field that
> will be search with user free text
>
>
> currently I am using text_general.
> I want to search with user free text search, therefor I would like
> tokenization, stemmings ...
> How do I define stemmers?
> Should I use text_en instead of  text_general?
> Thank you.
>