I just tested this with a DIH that does not use database input. If the DataImportHandler JDBC code does not support a schema that has optional fields, that is a major weakness. Noble/Shalin, is this true?
On Tue, Feb 2, 2010 at 8:50 AM, Sascha Szott <sz...@zib.de> wrote: > Hi, > > since some of the fields used in your DIH configuration aren't mandatory > (e.g., keywords and tags are defined as nullable in your db table schema), > add a default value to all optional fields in your schema configuration > (e.g., default = ""). Note, that Solr does not understand the db-related > concept of null values. > > Solr's log output > > SolrInputDocument[{keywords=keywords(1.0)={Dolce}, name=name(1.0)={Dolce > & Gabbana D&G Neckties designer Tie for men 543}, > productID=productID(1.0)={220213}}] > > indicates that there aren't any tags or descriptions stored for the item > with productId 220213. Since no default value is specified, Solr raises an > error when creating the index document. > > -Sascha > > Jean-Michel Philippon-Nadeau wrote: >> >> Hi, >> >> Thanks for the reply. >> >> On Tue, 2010-02-02 at 16:57 +0100, Sascha Szott wrote: >>> >>> * the output of MySQL's describe command for all tables/views referenced >>> in your DIH configuration >> >> mysql> describe products; >> >> +----------------+------------------+------+-----+---------+----------------+ >> | Field | Type | Null | Key | Default | Extra >> | >> >> +----------------+------------------+------+-----+---------+----------------+ >> | productID | int(10) unsigned | NO | PRI | NULL | >> auto_increment | >> | skuCode | varchar(320) | YES | MUL | NULL | >> | >> | upcCode | varchar(320) | YES | MUL | NULL | >> | >> | name | varchar(320) | NO | | NULL | >> | >> | description | text | NO | | NULL | >> | >> | keywords | text | YES | | NULL | >> | >> | disqusThreadID | varchar(50) | NO | | NULL | >> | >> | tags | text | YES | | NULL | >> | >> | createdOn | int(10) unsigned | NO | | NULL | >> | >> | lastUpdated | int(10) unsigned | NO | | NULL | >> | >> | imageURL | varchar(320) | YES | | NULL | >> | >> | inStock | tinyint(1) | YES | MUL | 1 | >> | >> | active | tinyint(1) | YES | | 1 | >> | >> >> +----------------+------------------+------+-----+---------+----------------+ >> 13 rows in set (0.00 sec) >> >> mysql> describe product_soldby_vendor; >> +-----------------+------------------+------+-----+---------+-------+ >> | Field | Type | Null | Key | Default | Extra | >> +-----------------+------------------+------+-----+---------+-------+ >> | productID | int(10) unsigned | NO | MUL | NULL | | >> | productVendorID | int(10) unsigned | NO | MUL | NULL | | >> | price | double | NO | | NULL | | >> | currency | varchar(5) | NO | | NULL | | >> | buyURL | varchar(320) | NO | | NULL | | >> +-----------------+------------------+------+-----+---------+-------+ >> 5 rows in set (0.00 sec) >> >> mysql> describe products_vendors_subcategories; >> >> +----------------------------+------------------+------+-----+---------+----------------+ >> | Field | Type | Null | Key | Default | >> Extra | >> >> +----------------------------+------------------+------+-----+---------+----------------+ >> | productVendorSubcategoryID | int(10) unsigned | NO | PRI | NULL | >> auto_increment | >> | productVendorCategoryID | int(10) unsigned | NO | | NULL | >> | >> | labelEnglish | varchar(320) | NO | | NULL | >> | >> | labelFrench | varchar(320) | NO | | NULL | >> | >> >> +----------------------------+------------------+------+-----+---------+----------------+ >> 4 rows in set (0.00 sec) >> >> mysql> describe products_vendors_categories; >> >> +-------------------------+------------------+------+-----+---------+----------------+ >> | Field | Type | Null | Key | Default | >> Extra | >> >> +-------------------------+------------------+------+-----+---------+----------------+ >> | productVendorCategoryID | int(10) unsigned | NO | PRI | NULL | >> auto_increment | >> | labelEnglish | varchar(320) | NO | | NULL | >> | >> | labelFrench | varchar(320) | NO | | NULL | >> | >> >> +-------------------------+------------------+------+-----+---------+----------------+ >> 3 rows in set (0.00 sec) >> >> mysql> describe product_vendor_in_subcategory; >> +-------------------+------------------+------+-----+---------+-------+ >> | Field | Type | Null | Key | Default | Extra | >> +-------------------+------------------+------+-----+---------+-------+ >> | productVendorID | int(10) unsigned | NO | MUL | NULL | | >> | productCategoryID | int(10) unsigned | NO | MUL | NULL | | >> +-------------------+------------------+------+-----+---------+-------+ >> 2 rows in set (0.00 sec) >> >> mysql> describe products_vendors_countries; >> >> +------------------------+------------------+------+-----+---------+----------------+ >> | Field | Type | Null | Key | Default | >> Extra | >> >> +------------------------+------------------+------+-----+---------+----------------+ >> | productVendorCountryID | int(10) unsigned | NO | PRI | NULL | >> auto_increment | >> | name | varchar(50) | NO | | NULL | >> | >> | code | varchar(2) | NO | | NULL | >> | >> >> +------------------------+------------------+------+-----+---------+----------------+ >> 3 rows in set (0.00 sec) >> >> mysql> describe product_vendor_shipsto_country; >> +------------------------+---------+------+-----+---------+-------+ >> | Field | Type | Null | Key | Default | Extra | >> +------------------------+---------+------+-----+---------+-------+ >> | productVendorID | int(11) | NO | MUL | NULL | | >> | productVendorCountryID | int(11) | NO | MUL | NULL | | >> +------------------------+---------+------+-----+---------+-------+ >> 2 rows in set (0.00 sec) >> >> >>> * the DIH configuration file (i.e., data-config.xml) >> >> <dataConfig> >> <dataSource type="JdbcDataSource" >> driver="com.mysql.jdbc.Driver" >> url="jdbc:mysql://localhost/giftiniti_api" >> user="giftiniti_api" >> password="17i6ARfqJa1K"/> >> <document> >> >> <entity name="products" >> query="select productID,name,keywords,tags from products"> >> <field column="productID" name="productID"/> >> <field column="name" name="name"/> >> <field column="keywords" name="keywords"/> >> <field column="tags" name="tags"/> >> <field column="description" name="description"/> >> >> <entity name="soldby" >> query="select productVendorID,price from >> product_soldby_vendor where productID='${product.productID}'"> >> <field column="price" name="price"/> >> >> <entity name="insubcategory" >> query="select productCategoryID from >> product_vendor_in_subcategory where >> productVendorID='${soldby.productVendorID}'"> >> >> <entity name="subcategories" >> query="select concat_ws(' ',labelEnglish,labelFrench) >> as label, productVendorCategoryID from products_vendors_subcategories >> where productVendorSubcategoryID='${insubcategory.productCategoryID}'"> >> <field column="label" name="subcategories"/> >> >> <entity name="categories" >> query="select concat_ws(' >> ',labelEnglish,labelFrench) as label from products_vendors_categories >> where >> productVendorCategoryID='${subcategories.productVendorCategoryID}'"> >> <field column="label" name="categories"/> >> </entity> >> >> </entity> >> >> </entity> >> >> <entity name="shipsto" >> query="select productVendorCountryID from >> product_vendor_shipsto_country where >> productVendorID='${soldby.productVendorID}'"> >> <entity name="countries" >> query="select code from products_vendors_countries >> where productVendorCountryID='${shipsto.productVendorCountryID}'"> >> <field column="code" name="countries"/> >> </entity> >> </entity> >> >> </entity> >> >> </entity> >> >> </document> >> </dataConfig> >> >>> * the schema definition (i.e., schema.xml) >> >> <?xml version="1.0" encoding="UTF-8" ?> >> <schema name="example" version="1.2"> >> >> <types> >> <fieldType name="string" class="solr.StrField" >> sortMissingLast="true" omitNorms="true"/> >> <fieldType name="textgen" class="solr.TextField" >> positionIncrementGap="100"> >> <analyzer type="index"> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> <filter class="solr.StopFilterFactory" ignoreCase="true" >> words="stopwords.txt" enablePositionIncrements="true" /> >> <filter class="solr.WordDelimiterFilterFactory" >> generateWordParts="1" generateNumberParts="1" catenateWords="1" >> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> </analyzer> >> <analyzer type="query"> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> <filter class="solr.SynonymFilterFactory" >> synonyms="synonyms.txt" ignoreCase="true" expand="true"/> >> <filter class="solr.StopFilterFactory" >> ignoreCase="true" >> words="stopwords.txt" >> enablePositionIncrements="true" >> /> >> <filter class="solr.WordDelimiterFilterFactory" >> generateWordParts="1" generateNumberParts="1" catenateWords="0" >> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> </analyzer> >> </fieldType> >> <fieldType name="text_ws" class="solr.TextField" >> positionIncrementGap="100"> >> <analyzer> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> </analyzer> >> </fieldType> >> <fieldType name="text" class="solr.TextField" >> positionIncrementGap="100"> >> <analyzer type="index"> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> <filter class="solr.StopFilterFactory" >> ignoreCase="true" >> words="stopwords.txt" >> enablePositionIncrements="true" >> /> >> <filter class="solr.WordDelimiterFilterFactory" >> generateWordParts="1" generateNumberParts="1" catenateWords="1" >> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.SnowballPorterFilterFactory" >> language="English" protected="protwords.txt"/> >> </analyzer> >> <analyzer type="query"> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> <filter class="solr.SynonymFilterFactory" >> synonyms="synonyms.txt" ignoreCase="true" expand="true"/> >> <filter class="solr.StopFilterFactory" >> ignoreCase="true" >> words="stopwords.txt" >> enablePositionIncrements="true" >> /> >> <filter class="solr.WordDelimiterFilterFactory" >> generateWordParts="1" generateNumberParts="1" catenateWords="0" >> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.SnowballPorterFilterFactory" >> language="English" protected="protwords.txt"/> >> </analyzer> >> </fieldType> >> </analyzer> >> </fieldType> >> <fieldType name="double" class="solr.TrieDoubleField" >> precisionStep="0" omitNorms="true" positionIncrementGap="0"/> >> >> </types> >> >> <fields> >> <field name="productID" type="string" indexed="true" stored="true" >> required="true" /> >> >> <field name="name" type="textgen" indexed="true" stored="true"/> >> <field name="keywords" type="textgen" indexed="true" stored="true"/> >> <field name="tags" type="textgen" indexed="true" stored="true"/> >> <field name="description" type="text" indexed="true" stored="true"/> >> >> <field name="price" type="double" indexed="true" stored="true"/> >> >> <field name="categories" type="text_ws" indexed="true" stored="true" >> multiValued="true" omitNorms="true" /> >> <field name="subcategories" type="text_ws" indexed="true" >> stored="true" multiValued="true" omitNorms="true" /> >> <field name="countries" type="text_ws" indexed="true" stored="true" >> multiValued="true" omitNorms="true" /> >> >> <!-- Text is a meta-data field containing all fields that are going >> to be searchable by default --> >> <field name="text" type="text" indexed="true" stored="false" >> multiValued="true"/> >> </fields> >> >> <uniqueKey>productID</uniqueKey> >> >> <defaultSearchField>text</defaultSearchField> >> >> <solrQueryParser defaultOperator="AND"/> >> >> <copyField source="name" dest="text"/> >> <copyField source="keywords" dest="text"/> >> <copyField source="tags" dest="text"/> >> <copyField source="description" dest="text"/> >> >> </schema> >> >> >> >> Jean-Michel >> > > -- Lance Norskog goks...@gmail.com