I just tested this with a DIH that does not use database input.

If the DataImportHandler JDBC code does not support a schema that has
optional fields, that is a major weakness. Noble/Shalin, is this true?

On Tue, Feb 2, 2010 at 8:50 AM, Sascha Szott <sz...@zib.de> wrote:
> Hi,
>
> since some of the fields used in your DIH configuration aren't mandatory
> (e.g., keywords and tags are defined as nullable in your db table schema),
> add a default value to all optional fields in your schema configuration
> (e.g., default = ""). Note, that Solr does not understand the db-related
> concept of null values.
>
> Solr's log output
>
> SolrInputDocument[{keywords=keywords(1.0)={Dolce}, name=name(1.0)={Dolce
> &amp; Gabbana D&amp;G Neckties designer Tie for men 543},
> productID=productID(1.0)={220213}}]
>
> indicates that there aren't any tags or descriptions stored for the item
> with productId 220213. Since no default value is specified, Solr raises an
> error when creating the index document.
>
> -Sascha
>
> Jean-Michel Philippon-Nadeau wrote:
>>
>> Hi,
>>
>> Thanks for the reply.
>>
>> On Tue, 2010-02-02 at 16:57 +0100, Sascha Szott wrote:
>>>
>>> * the output of MySQL's describe command for all tables/views referenced
>>> in your DIH configuration
>>
>> mysql>  describe products;
>>
>> +----------------+------------------+------+-----+---------+----------------+
>> | Field          | Type             | Null | Key | Default | Extra
>> |
>>
>> +----------------+------------------+------+-----+---------+----------------+
>> | productID      | int(10) unsigned | NO   | PRI | NULL    |
>> auto_increment |
>> | skuCode        | varchar(320)     | YES  | MUL | NULL    |
>> |
>> | upcCode        | varchar(320)     | YES  | MUL | NULL    |
>> |
>> | name           | varchar(320)     | NO   |     | NULL    |
>> |
>> | description    | text             | NO   |     | NULL    |
>> |
>> | keywords       | text             | YES  |     | NULL    |
>> |
>> | disqusThreadID | varchar(50)      | NO   |     | NULL    |
>> |
>> | tags           | text             | YES  |     | NULL    |
>> |
>> | createdOn      | int(10) unsigned | NO   |     | NULL    |
>> |
>> | lastUpdated    | int(10) unsigned | NO   |     | NULL    |
>> |
>> | imageURL       | varchar(320)     | YES  |     | NULL    |
>> |
>> | inStock        | tinyint(1)       | YES  | MUL | 1       |
>> |
>> | active         | tinyint(1)       | YES  |     | 1       |
>> |
>>
>> +----------------+------------------+------+-----+---------+----------------+
>> 13 rows in set (0.00 sec)
>>
>> mysql>  describe product_soldby_vendor;
>> +-----------------+------------------+------+-----+---------+-------+
>> | Field           | Type             | Null | Key | Default | Extra |
>> +-----------------+------------------+------+-----+---------+-------+
>> | productID       | int(10) unsigned | NO   | MUL | NULL    |       |
>> | productVendorID | int(10) unsigned | NO   | MUL | NULL    |       |
>> | price           | double           | NO   |     | NULL    |       |
>> | currency        | varchar(5)       | NO   |     | NULL    |       |
>> | buyURL          | varchar(320)     | NO   |     | NULL    |       |
>> +-----------------+------------------+------+-----+---------+-------+
>> 5 rows in set (0.00 sec)
>>
>> mysql>  describe products_vendors_subcategories;
>>
>> +----------------------------+------------------+------+-----+---------+----------------+
>> | Field                      | Type             | Null | Key | Default |
>> Extra          |
>>
>> +----------------------------+------------------+------+-----+---------+----------------+
>> | productVendorSubcategoryID | int(10) unsigned | NO   | PRI | NULL    |
>> auto_increment |
>> | productVendorCategoryID    | int(10) unsigned | NO   |     | NULL    |
>> |
>> | labelEnglish               | varchar(320)     | NO   |     | NULL    |
>> |
>> | labelFrench                | varchar(320)     | NO   |     | NULL    |
>> |
>>
>> +----------------------------+------------------+------+-----+---------+----------------+
>> 4 rows in set (0.00 sec)
>>
>> mysql>  describe products_vendors_categories;
>>
>> +-------------------------+------------------+------+-----+---------+----------------+
>> | Field                   | Type             | Null | Key | Default |
>> Extra          |
>>
>> +-------------------------+------------------+------+-----+---------+----------------+
>> | productVendorCategoryID | int(10) unsigned | NO   | PRI | NULL    |
>> auto_increment |
>> | labelEnglish            | varchar(320)     | NO   |     | NULL    |
>> |
>> | labelFrench             | varchar(320)     | NO   |     | NULL    |
>> |
>>
>> +-------------------------+------------------+------+-----+---------+----------------+
>> 3 rows in set (0.00 sec)
>>
>> mysql>  describe product_vendor_in_subcategory;
>> +-------------------+------------------+------+-----+---------+-------+
>> | Field             | Type             | Null | Key | Default | Extra |
>> +-------------------+------------------+------+-----+---------+-------+
>> | productVendorID   | int(10) unsigned | NO   | MUL | NULL    |       |
>> | productCategoryID | int(10) unsigned | NO   | MUL | NULL    |       |
>> +-------------------+------------------+------+-----+---------+-------+
>> 2 rows in set (0.00 sec)
>>
>> mysql>  describe products_vendors_countries;
>>
>> +------------------------+------------------+------+-----+---------+----------------+
>> | Field                  | Type             | Null | Key | Default |
>> Extra          |
>>
>> +------------------------+------------------+------+-----+---------+----------------+
>> | productVendorCountryID | int(10) unsigned | NO   | PRI | NULL    |
>> auto_increment |
>> | name                   | varchar(50)      | NO   |     | NULL    |
>> |
>> | code                   | varchar(2)       | NO   |     | NULL    |
>> |
>>
>> +------------------------+------------------+------+-----+---------+----------------+
>> 3 rows in set (0.00 sec)
>>
>> mysql>  describe product_vendor_shipsto_country;
>> +------------------------+---------+------+-----+---------+-------+
>> | Field                  | Type    | Null | Key | Default | Extra |
>> +------------------------+---------+------+-----+---------+-------+
>> | productVendorID        | int(11) | NO   | MUL | NULL    |       |
>> | productVendorCountryID | int(11) | NO   | MUL | NULL    |       |
>> +------------------------+---------+------+-----+---------+-------+
>> 2 rows in set (0.00 sec)
>>
>>
>>> * the DIH configuration file (i.e., data-config.xml)
>>
>> <dataConfig>
>>   <dataSource type="JdbcDataSource"
>>               driver="com.mysql.jdbc.Driver"
>>               url="jdbc:mysql://localhost/giftiniti_api"
>>               user="giftiniti_api"
>>               password="17i6ARfqJa1K"/>
>>   <document>
>>
>>     <entity name="products"
>>             query="select productID,name,keywords,tags from products">
>>        <field column="productID" name="productID"/>
>>        <field column="name" name="name"/>
>>        <field column="keywords" name="keywords"/>
>>        <field column="tags" name="tags"/>
>>        <field column="description" name="description"/>
>>
>>        <entity name="soldby"
>>                query="select productVendorID,price from
>> product_soldby_vendor where productID='${product.productID}'">
>>          <field column="price" name="price"/>
>>
>>          <entity name="insubcategory"
>>                  query="select productCategoryID from
>> product_vendor_in_subcategory where
>> productVendorID='${soldby.productVendorID}'">
>>
>>            <entity name="subcategories"
>>                    query="select concat_ws(' ',labelEnglish,labelFrench)
>> as label, productVendorCategoryID from products_vendors_subcategories
>> where productVendorSubcategoryID='${insubcategory.productCategoryID}'">
>>              <field column="label" name="subcategories"/>
>>
>>                <entity name="categories"
>>                        query="select concat_ws('
>> ',labelEnglish,labelFrench) as label from products_vendors_categories
>> where
>> productVendorCategoryID='${subcategories.productVendorCategoryID}'">
>>                  <field column="label" name="categories"/>
>>                </entity>
>>
>>            </entity>
>>
>>          </entity>
>>
>>          <entity name="shipsto"
>>                  query="select productVendorCountryID from
>> product_vendor_shipsto_country where
>> productVendorID='${soldby.productVendorID}'">
>>            <entity name="countries"
>>                    query="select code from products_vendors_countries
>> where productVendorCountryID='${shipsto.productVendorCountryID}'">
>>              <field column="code" name="countries"/>
>>            </entity>
>>          </entity>
>>
>>        </entity>
>>
>>     </entity>
>>
>>   </document>
>> </dataConfig>
>>
>>> * the schema definition (i.e., schema.xml)
>>
>> <?xml version="1.0" encoding="UTF-8" ?>
>> <schema name="example" version="1.2">
>>
>>   <types>
>>     <fieldType name="string" class="solr.StrField"
>> sortMissingLast="true" omitNorms="true"/>
>>     <fieldType name="textgen" class="solr.TextField"
>> positionIncrementGap="100">
>>       <analyzer type="index">
>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>         <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt" enablePositionIncrements="true" />
>>         <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>       </analyzer>
>>       <analyzer type="query">
>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>         <filter class="solr.SynonymFilterFactory"
>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>>         <filter class="solr.StopFilterFactory"
>>                 ignoreCase="true"
>>                 words="stopwords.txt"
>>                 enablePositionIncrements="true"
>>                 />
>>         <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>       </analyzer>
>>     </fieldType>
>>     <fieldType name="text_ws" class="solr.TextField"
>> positionIncrementGap="100">
>>       <analyzer>
>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>       </analyzer>
>>     </fieldType>
>>     <fieldType name="text" class="solr.TextField"
>> positionIncrementGap="100">
>>       <analyzer type="index">
>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>         <filter class="solr.StopFilterFactory"
>>                 ignoreCase="true"
>>                 words="stopwords.txt"
>>                 enablePositionIncrements="true"
>>                 />
>>         <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>         <filter class="solr.SnowballPorterFilterFactory"
>> language="English" protected="protwords.txt"/>
>>       </analyzer>
>>       <analyzer type="query">
>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>         <filter class="solr.SynonymFilterFactory"
>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>>         <filter class="solr.StopFilterFactory"
>>                 ignoreCase="true"
>>                 words="stopwords.txt"
>>                 enablePositionIncrements="true"
>>                 />
>>         <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>         <filter class="solr.SnowballPorterFilterFactory"
>> language="English" protected="protwords.txt"/>
>>       </analyzer>
>>     </fieldType>
>>       </analyzer>
>>     </fieldType>
>>     <fieldType name="double" class="solr.TrieDoubleField"
>> precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
>>
>>   </types>
>>
>>   <fields>
>>     <field name="productID" type="string" indexed="true" stored="true"
>> required="true" />
>>
>>     <field name="name" type="textgen" indexed="true" stored="true"/>
>>     <field name="keywords" type="textgen" indexed="true" stored="true"/>
>>     <field name="tags" type="textgen" indexed="true" stored="true"/>
>>     <field name="description" type="text" indexed="true" stored="true"/>
>>
>>     <field name="price" type="double" indexed="true" stored="true"/>
>>
>>     <field name="categories" type="text_ws" indexed="true" stored="true"
>> multiValued="true" omitNorms="true" />
>>     <field name="subcategories" type="text_ws" indexed="true"
>> stored="true" multiValued="true" omitNorms="true" />
>>     <field name="countries" type="text_ws" indexed="true" stored="true"
>> multiValued="true" omitNorms="true" />
>>
>>     <!-- Text is a meta-data field containing all fields that are going
>> to be searchable by default -->
>>     <field name="text" type="text" indexed="true" stored="false"
>> multiValued="true"/>
>>   </fields>
>>
>>   <uniqueKey>productID</uniqueKey>
>>
>>   <defaultSearchField>text</defaultSearchField>
>>
>>   <solrQueryParser defaultOperator="AND"/>
>>
>>   <copyField source="name" dest="text"/>
>>   <copyField source="keywords" dest="text"/>
>>   <copyField source="tags" dest="text"/>
>>   <copyField source="description" dest="text"/>
>>
>> </schema>
>>
>>
>>
>> Jean-Michel
>>
>
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to