Tools for schema.xml generation and to import from a database

2012-07-29 Thread Andre Lopes
Hi,

I'm new to Solr. I've installed 3.6.1 but I'm a little bit confused
about what and how to do next. I will use the Jetty version for now.

Two poinst I need to know:

1 - I've 2 views that I would like to import to Solr. I think I must
do a schema.xml and then import data to that schema. I'm correct with
this one?

2 - About tools to autogenerate the schema.xml, there are any? And
about tools to import data to the schema, there are any(I'm using
Python)?


Please give me some clues.

Thanks,

Best Regards,
André.


Re: Tools for schema.xml generation and to import from a database

2012-07-30 Thread Andre Lopes
Thanks for the reply Alexandre,

I will test your clues as soon as possible.

Best Regards,



On Mon, Jul 30, 2012 at 4:15 AM, Alexandre Rafalovitch
 wrote:
> If you are just starting with SOLR, you might as well jump to 4.0
> Alpha. By the time you finished, it will be the production copy.
>
> If you want to index stuff from the database, your first step is
> probably to use DataImportHandler (DIH). Once you get past the basics,
> you may want to do a custom code, but start from from DIH for faster
> results.
>
> You will want to modify schema.xml. I started by using DIH example and
> just adding an extra core at first. This might be easier than building
> a full directory setup from scratch.
>
> You also don't actually need to configure schema too much at the
> beginning. You can start by using dynamic fields. So, if in DIH, you
> say that your target field is XYZ_i it is automatically picked by as
> an integer field by SOLR (due to "*_i" definition that you do need to
> have). This will not work for fields you want to do aggregation on
> (e.g. multiple text fields copied into one for easier search), for
> multilingual text fields, etc. But it will get you going.
>
> Oh, and welcome to SOLR. You will like it.
>
> Regards,
>Alex.
>
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Sun, Jul 29, 2012 at 3:45 PM, Andre Lopes  wrote:
>> Hi,
>>
>> I'm new to Solr. I've installed 3.6.1 but I'm a little bit confused
>> about what and how to do next. I will use the Jetty version for now.
>>
>> Two poinst I need to know:
>>
>> 1 - I've 2 views that I would like to import to Solr. I think I must
>> do a schema.xml and then import data to that schema. I'm correct with
>> this one?
>>
>> 2 - About tools to autogenerate the schema.xml, there are any? And
>> about tools to import data to the schema, there are any(I'm using
>> Python)?
>>
>>
>> Please give me some clues.
>>
>> Thanks,
>>
>> Best Regards,
>> André.


How to configure schema.xml to take in account two database tables?

2012-08-05 Thread Andre Lopes
Hi,

I'm new to Solr. I've take some reads about how it works, but I can't
find a clue for my specific situation.

Here is my case. I've 2 database tables that I need to add to the
index, but they are related. One entry in the table "clients" could
have more than one entry in the table "contacts". Here is the visual
example:

Table clients:

id | name| description
1  | customer 1  | This is the description of customer 1
2  | customer 2  | This is the description of customer 2
3  | customer 3  | This is the description of customer 3
4  | customer 4  | This is the description of customer 4

Table contacts:

id | phone_number
1  | 921234567
1  | 932122345
2  | 934545444
3  | 943322345
3  | 343445545
3  | 213443435

I think the case is simple. If in a search I input "921234567" I must
to present information about "customer 1".

My question... How can I setup the schema.xml in a way that I will
take in account this two database tables?


Best Regards,


Re: How to configure schema.xml to take in account two database tables?

2012-08-05 Thread Andre Lopes
Thanks for the replies,

I've now successfully indexed the database using the DataImportHandler
but there is something weird. I've indexed 654 entries but I can't
output all the 654 results.

After the I run the
"http://localhost:8983/solr/dataimport?command=full-import"; I got 654
adds:

Aug 5, 2012 6:16:51 PM
org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: {deleteByQuery=*:*,add=[http://1.com, http://2.com,
http://3.com, http://4.com, http://5.com, http://6a.com, http://7.vu,
http://8.com/, ... (654 adds)],commit=} 0 35

But when I query the Solr with this query
"http://localhost:8983/solr/select?q=*:*"; I only get 9 results.

I've used a very basic schema.xml:




  

  

  






  

id
   




Some clues on what I'm doing wrong?

Best Regards,






On Sun, Aug 5, 2012 at 1:19 PM, Gora Mohanty  wrote:
> On 5 August 2012 17:17, Andre Lopes  wrote:
>> Hi,
>>
>> I'm new to Solr. I've take some reads about how it works, but I can't
>> find a clue for my specific situation.
>>
>> Here is my case. I've 2 database tables that I need to add to the
>> index, but they are related. One entry in the table "clients" could
>> have more than one entry in the table "contacts".
> [...]
>
> There seem to be various things that you need clarity on:
> 1. Firstly, schema.xml describes the various fields that you
> might be indexing, and/or storing in Solr. Thus, it should
> contain a description for each field that you will be using,
> no matter what data source the field might come from.
> 2. One typically flattens data when indexing into Solr.
> Following your example, as customers can have multiple
> phone numbers, you should denormalise your data.
> E.g., each Solr record could have these fields:
>, , 
> Thus, for customer 1 you would need two records, for
> customer 2 one record, and for customer 3 three records.
>
> You might find this blog useful, though it probably has
>  more detail than you need:
>  http://mysolr.com/tips/denormalized-data-structure/
> 3. You will need some way to index the data into Solr. One
> way is to use the DataImportHandler which allows
> indexing from multiple databases:
> http://wiki.apache.org/solr/DataImportHandler
>
> Regards,
> Gora


Re: How to configure schema.xml to take in account two database tables?

2012-08-05 Thread Andre Lopes
Hi,

Thanks for the replies. The info in my admin/stats is the following:

searcherName : Searcher@f4e40da main
caching : true
numDocs : 654
maxDoc : 654
reader : 
SolrIndexReader{this=6a6078e7,r=ReadOnlyDirectoryReader@6a6078e7,refCnt=1,segments=1}
readerDir : 
org.apache.lucene.store.MMapDirectory@/home/andre/workspace/test/3rd_party/solr/apache-solr-3.6.1/example/solr/data/index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@51a422f6
indexVersion : 1343578710140
openedAt : Sun Aug 05 19:04:35 WEST 2012
registeredAt : Sun Aug 05 19:04:35 WEST 2012
warmupTime : 15

There are 654 docs.

Some more info, my solrconfig.xml:

  
  

 
   db-data-config.xml
 

  


My db-data-config.xml:






   







My schema.xml:



  

  

  


 
  
uri
   



I've tested, and the SELECT in the db-data-config.xml outputs 654
results. Some more clues?


Best Regards,




On Sun, Aug 5, 2012 at 6:59 PM, Erick Erickson  wrote:
> A quick check here is to go to your admin/stats page and look at
> numDocs and maxDocs. numDocs is the number of documents that it's
> possible to find, i.e. non updated/deleted docs. maxDocs is the number
> of documents that have been added, and that count includes ones with
> duplicate unique IDs.
>
> So I'm guessing that numDocs == 9 and maxDocs == 654, which as Jack
> says indicates that your uniqueKey is repeated for lots and lots of
> your data...
>
> Best
> Erick
>
> On Sun, Aug 5, 2012 at 1:40 PM, Jack Krupansky  
> wrote:
>> Make sure the id is not duplicated. You might have inadvertently populated
>> the id field in your Solr schema with some non-key value that occurs with
>> high frequency (and may have roughly 9 unique values.)
>>
>> Examine the 9 results and their id fields. Then look at some of your input
>> data to verify that the values placed in the id field are what you expected.
>>
>> If possible, identify one input record that isn't in the 9 results but
>> should be and verify its id.
>>
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Andre Lopes
>> Sent: Sunday, August 05, 2012 1:31 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: How to configure schema.xml to take in account two database
>> tables?
>>
>>
>> Thanks for the replies,
>>
>> I've now successfully indexed the database using the DataImportHandler
>> but there is something weird. I've indexed 654 entries but I can't
>> output all the 654 results.
>>
>> After the I run the
>> "http://localhost:8983/solr/dataimport?command=full-import"; I got 654
>> adds:
>>
>> Aug 5, 2012 6:16:51 PM
>> org.apache.solr.update.processor.LogUpdateProcessor finish
>> INFO: {deleteByQuery=*:*,add=[http://1.com, http://2.com,
>> http://3.com, http://4.com, http://5.com, http://6a.com, http://7.vu,
>> http://8.com/, ... (654 adds)],commit=} 0 35
>>
>> But when I query the Solr with this query
>> "http://localhost:8983/solr/select?q=*:*"; I only get 9 results.
>>
>> I've used a very basic schema.xml:
>>
>> 
>> 
>>
>>  
>>
>>  
>>
>>  
>>> />
>>
>>> multiValued="false" />
>>> multiValued="false" />
>>> multiValued="false" />
>>
>>  
>>
>>id
>>   
>>
>> 
>>
>>
>> Some clues on what I'm doing wrong?
>>
>> Best Regards,
>>
>>
>>
>>
>>
>>
>> On Sun, Aug 5, 2012 at 1:19 PM, Gora Mohanty  wrote:
>>>
>>> On 5 August 2012 17:17, Andre Lopes  wrote:
>>>>
>>>> Hi,
>>>>
>>>> I'm new to Solr. I've take some reads about how it works, but I can't
>>>> find a clue for my specific situation.
>>>>
>>>> Here is my case. I've 2 database tables that I need to add to the
>>>> index, but they are related. One entry in the table "clients" could
>>>> have more than one entry in the table "contacts".
>>>
>>> [...]
>>>
>>> There seem to be various things that you need clarity on:
>>> 1. Firstly, schema.xml describes the various fields that you
>>> might be indexing, and/or storing in Solr. Thus, it should
>>> contain a description for each field that you will be using,
>>> no matter what data source the field might come from.
>>> 2. One typically flattens data when indexing into Solr.
>>> Following your example, as customers can have multiple
>>> phone numbers, you should denormalise your data.
>>> E.g., each Solr record could have these fields:
>>>, , 
>>> Thus, for customer 1 you would need two records, for
>>> customer 2 one record, and for customer 3 three records.
>>>
>>> You might find this blog useful, though it probably has
>>>  more detail than you need:
>>>  http://mysolr.com/tips/denormalized-data-structure/
>>> 3. You will need some way to index the data into Solr. One
>>> way is to use the DataImportHandler which allows
>>> indexing from multiple databases:
>>> http://wiki.apache.org/solr/DataImportHandler
>>>
>>> Regards,
>>> Gora
>>
>>


Re: How to configure schema.xml to take in account two database tables?

2012-08-05 Thread Andre Lopes
Hi,

I've found what's wrong. By default the query was returning 10 results.

With "rows" I can now return more than 10:

http://localhost:8983/solr/select?q=*:*&rows=400

Thanks for the help. From here I will try do dig deeper.

Best Regards,


On Sun, Aug 5, 2012 at 7:20 PM, Andre Lopes  wrote:
> Hi,
>
> Thanks for the replies. The info in my admin/stats is the following:
>
> searcherName : Searcher@f4e40da main
> caching : true
> numDocs : 654
> maxDoc : 654
> reader : 
> SolrIndexReader{this=6a6078e7,r=ReadOnlyDirectoryReader@6a6078e7,refCnt=1,segments=1}
> readerDir : 
> org.apache.lucene.store.MMapDirectory@/home/andre/workspace/test/3rd_party/solr/apache-solr-3.6.1/example/solr/data/index
> lockFactory=org.apache.lucene.store.NativeFSLockFactory@51a422f6
> indexVersion : 1343578710140
> openedAt : Sun Aug 05 19:04:35 WEST 2012
> registeredAt : Sun Aug 05 19:04:35 WEST 2012
> warmupTime : 15
>
> There are 654 docs.
>
> Some more info, my solrconfig.xml:
>
>   
>class="org.apache.solr.handler.dataimport.DataImportHandler">
> 
>  
>db-data-config.xml
>  
>
>   
>
>
> My db-data-config.xml:
>
> 
> 
>  url="jdbc:postgresql://localhost:5432/euvoudebicicleta" user="myuser"
> password="mypass" />
> 
> 
> 
> 
> 
> 
> 
> 
>
>
> My schema.xml:
>
> 
> 
>   
> 
>   
>
>   
>  />
> 
> 
>   
> uri
>
> 
>
>
> I've tested, and the SELECT in the db-data-config.xml outputs 654
> results. Some more clues?
>
>
> Best Regards,
>
>
>
>
> On Sun, Aug 5, 2012 at 6:59 PM, Erick Erickson  
> wrote:
>> A quick check here is to go to your admin/stats page and look at
>> numDocs and maxDocs. numDocs is the number of documents that it's
>> possible to find, i.e. non updated/deleted docs. maxDocs is the number
>> of documents that have been added, and that count includes ones with
>> duplicate unique IDs.
>>
>> So I'm guessing that numDocs == 9 and maxDocs == 654, which as Jack
>> says indicates that your uniqueKey is repeated for lots and lots of
>> your data...
>>
>> Best
>> Erick
>>
>> On Sun, Aug 5, 2012 at 1:40 PM, Jack Krupansky  
>> wrote:
>>> Make sure the id is not duplicated. You might have inadvertently populated
>>> the id field in your Solr schema with some non-key value that occurs with
>>> high frequency (and may have roughly 9 unique values.)
>>>
>>> Examine the 9 results and their id fields. Then look at some of your input
>>> data to verify that the values placed in the id field are what you expected.
>>>
>>> If possible, identify one input record that isn't in the 9 results but
>>> should be and verify its id.
>>>
>>>
>>> -- Jack Krupansky
>>>
>>> -Original Message- From: Andre Lopes
>>> Sent: Sunday, August 05, 2012 1:31 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: How to configure schema.xml to take in account two database
>>> tables?
>>>
>>>
>>> Thanks for the replies,
>>>
>>> I've now successfully indexed the database using the DataImportHandler
>>> but there is something weird. I've indexed 654 entries but I can't
>>> output all the 654 results.
>>>
>>> After the I run the
>>> "http://localhost:8983/solr/dataimport?command=full-import"; I got 654
>>> adds:
>>>
>>> Aug 5, 2012 6:16:51 PM
>>> org.apache.solr.update.processor.LogUpdateProcessor finish
>>> INFO: {deleteByQuery=*:*,add=[http://1.com, http://2.com,
>>> http://3.com, http://4.com, http://5.com, http://6a.com, http://7.vu,
>>> http://8.com/, ... (654 adds)],commit=} 0 35
>>>
>>> But when I query the Solr with this query
>>> "http://localhost:8983/solr/select?q=*:*"; I only get 9 results.
>>>
>>> I've used a very basic schema.xml:
>>>
>>> 
>>> 
>>>
>>>  
>>>
>>>  
>>>
>>>  
>>>>> />
>>>
>>>>> multiValued="false" />
>>>>> multiValued="false" />
>>>>> multiValued="false" />
>>>
>>>  
>>>
>>>id
>>>   
>>>
>>>

Solr 1.4 very slow - 60.000 documents

2012-08-30 Thread Andre Lopes
Hi,

I've a Solr 1.4 with about 60.000 documents but is getting very slow
in the searches. The website have usually very very few users. The
machine have 512MB Ram and I'm serving with Nginx one Python based web
app that is very low weight.

How can I search for the cause of the Solr slowdown? Any clues?


Best Regards,