Hi,
Thanks for the replies. The info in my admin/stats is the following:
searcherName : Searcher@f4e40da main
caching : true
numDocs : 654
maxDoc : 654
reader :
SolrIndexReader{this=6a6078e7,r=ReadOnlyDirectoryReader@6a6078e7,refCnt=1,segments=1}
readerDir :
org.apache.lucene.store.MMapDirectory@/home/andre/workspace/test/3rd_party/solr/apache-solr-3.6.1/example/solr/data/index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@51a422f6
indexVersion : 1343578710140
openedAt : Sun Aug 05 19:04:35 WEST 2012
registeredAt : Sun Aug 05 19:04:35 WEST 2012
warmupTime : 15
There are 654 docs.
Some more info, my solrconfig.xml:
<!-- Request handler added by Andre Lopes to import data from database -->
<requestHandler name="/dataimport"
class="org.apache.solr.handler.dataimport.DataImportHandler">
<!-- default values for query parameters can be specified, these
will be overridden by parameters in the request
-->
<lst name="defaults">
<str name="config">db-data-config.xml</str>
</lst>
</requestHandler>
My db-data-config.xml:
<?xml version="1.0" encoding="UTF-8" ?>
<dataConfig>
<dataSource driver="org.postgresql.Driver"
url="jdbc:postgresql://localhost:5432/euvoudebicicleta" user="myuser"
password="mypass" />
<document>
<entity name="bicyclebusinesses" query="select * from
table_text__single_occurrencies order by date_inserted">
<field column="uri" name="uri" />
<field column="business_name" name="name" />
<field column="business_address" name="address" />
</entity>
</document>
</dataConfig>
My schema.xml:
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="example" version="1.5">
<types>
<fieldType name="string" class="solr.StrField"/>
</types>
<fields>
<dynamicField name="*" type="string" indexed="false" stored="false" />
<field name="uri" type="string" indexed="true" stored="true" />
<!--
<field name="name" type="string" indexed="true" stored="true" />
<field name="address" type="string" indexed="true" stored="true" />
-->
</fields>
<uniqueKey>uri</uniqueKey>
<!-- <defaultSearchField>catchall</defaultSearchField> -->
</schema>
I've tested, and the SELECT in the db-data-config.xml outputs 654
results. Some more clues?
Best Regards,
On Sun, Aug 5, 2012 at 6:59 PM, Erick Erickson <[email protected]> wrote:
> A quick check here is to go to your admin/stats page and look at
> numDocs and maxDocs. numDocs is the number of documents that it's
> possible to find, i.e. non updated/deleted docs. maxDocs is the number
> of documents that have been added, and that count includes ones with
> duplicate unique IDs.
>
> So I'm guessing that numDocs == 9 and maxDocs == 654, which as Jack
> says indicates that your uniqueKey is repeated for lots and lots of
> your data...
>
> Best
> Erick
>
> On Sun, Aug 5, 2012 at 1:40 PM, Jack Krupansky <[email protected]>
> wrote:
>> Make sure the id is not duplicated. You might have inadvertently populated
>> the id field in your Solr schema with some non-key value that occurs with
>> high frequency (and may have roughly 9 unique values.)
>>
>> Examine the 9 results and their id fields. Then look at some of your input
>> data to verify that the values placed in the id field are what you expected.
>>
>> If possible, identify one input record that isn't in the 9 results but
>> should be and verify its id.
>>
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Andre Lopes
>> Sent: Sunday, August 05, 2012 1:31 PM
>> To: [email protected]
>> Subject: Re: How to configure schema.xml to take in account two database
>> tables?
>>
>>
>> Thanks for the replies,
>>
>> I've now successfully indexed the database using the DataImportHandler
>> but there is something weird. I've indexed 654 entries but I can't
>> output all the 654 results.
>>
>> After the I run the
>> "http://localhost:8983/solr/dataimport?command=full-import" I got 654
>> adds:
>>
>> Aug 5, 2012 6:16:51 PM
>> org.apache.solr.update.processor.LogUpdateProcessor finish
>> INFO: {deleteByQuery=*:*,add=[http://1.com, http://2.com,
>> http://3.com, http://4.com, http://5.com, http://6a.com, http://7.vu,
>> http://8.com/, ... (654 adds)],commit=} 0 35
>>
>> But when I query the Solr with this query
>> "http://localhost:8983/solr/select?q=*:*" I only get 9 results.
>>
>> I've used a very basic schema.xml:
>>
>> <?xml version="1.0" encoding="UTF-8" ?>
>> <schema name="example" version="1.5">
>>
>> <types>
>> <fieldType name="string" class="solr.StrField"/>
>> </types>
>>
>> <fields>
>> <dynamicField name="*" type="string" indexed="true" stored="true"
>> />
>>
>> <field name="id" type="string" indexed="true" stored="true"
>> multiValued="false" />
>> <field name="name" type="string" indexed="true" stored="true"
>> multiValued="false" />
>> <field name="address" type="string" indexed="true" stored="true"
>> multiValued="false" />
>>
>> </fields>
>>
>> <uniqueKey>id</uniqueKey>
>> <!-- <defaultSearchField>catchall</defaultSearchField> -->
>>
>> </schema>
>>
>>
>> Some clues on what I'm doing wrong?
>>
>> Best Regards,
>>
>>
>>
>>
>>
>>
>> On Sun, Aug 5, 2012 at 1:19 PM, Gora Mohanty <[email protected]> wrote:
>>>
>>> On 5 August 2012 17:17, Andre Lopes <[email protected]> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I'm new to Solr. I've take some reads about how it works, but I can't
>>>> find a clue for my specific situation.
>>>>
>>>> Here is my case. I've 2 database tables that I need to add to the
>>>> index, but they are related. One entry in the table "clients" could
>>>> have more than one entry in the table "contacts".
>>>
>>> [...]
>>>
>>> There seem to be various things that you need clarity on:
>>> 1. Firstly, schema.xml describes the various fields that you
>>> might be indexing, and/or storing in Solr. Thus, it should
>>> contain a description for each field that you will be using,
>>> no matter what data source the field might come from.
>>> 2. One typically flattens data when indexing into Solr.
>>> Following your example, as customers can have multiple
>>> phone numbers, you should denormalise your data.
>>> E.g., each Solr record could have these fields:
>>> <cust. name>, <cust. desc.>, <phone>
>>> Thus, for customer 1 you would need two records, for
>>> customer 2 one record, and for customer 3 three records.
>>>
>>> You might find this blog useful, though it probably has
>>> more detail than you need:
>>> http://mysolr.com/tips/denormalized-data-structure/
>>> 3. You will need some way to index the data into Solr. One
>>> way is to use the DataImportHandler which allows
>>> indexing from multiple databases:
>>> http://wiki.apache.org/solr/DataImportHandler
>>>
>>> Regards,
>>> Gora
>>
>>