I am rebuilding a new docker image with each change on the config file so solr
starts fresh every time.
<requestHandler name="/dataimport" initParams="myInitParams"
class="solr.DataImportHandler">
<lst name="defaults">
<str name="update.chain">add-unknown-fields-to-the-schema</str>
<str name="config">solr-data-config.xml</str>
</lst>
</requestHandler>
still having document like such:
"response":{"numFound":8,"start":0,"docs":[
{
"id":"38822",
"_version_":1542264667720646656},
{
If add add the Body field using the Schema section of the Admin UI, This field
is getting indexed during the dataimport.
It seems that solr.DataImportHandler does not allow the
add-unknown-fields-to-the-schema update.chain.
Pierre
> On 10 Aug 2016, at 18:33, Alexandre Rafalovitch <[email protected]> wrote:
>
> Ok, to reduce the magic, you can just stick "update.chain" parameter
> inside the defaults of the dataimport handler directly.
>
> You can also pass it just as a URL parameter. That's what 'defaults'
> section mean.
>
> And, just to be paranoid, you did reload the core after each of those
> changes to test it? These are not picked up automatically.
>
> Regards,
> Alex.
> ----
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 10 August 2016 at 18:28, Pierre Caserta <[email protected]> wrote:
>> It did not work,
>> I tried many things and ended up trying this:
>>
>> <requestHandler name="/dataimport" initParams="myInitParams"
>> class="solr.DataImportHandler">
>> <lst name="defaults">
>> <str name="config">solr-data-config.xml</str>
>> </lst>
>> </requestHandler>
>> <initParams name="myInitParams" path="/update/**,/dataimport">
>> <lst name="defaults">
>> <str name="update.chain">add-unknown-fields-to-the-schema</str>
>> </lst>
>> </initParams>
>>
>> Regards,
>> Pierre
>>
>>> On 10 Aug 2016, at 18:08, Alexandre Rafalovitch <[email protected]> wrote:
>>>
>>> Your initParams section does not apply to /dataimport handler as
>>> defined. Try modifying it to say:
>>> path="/update/**,/dataimport"
>>>
>>> Hopefully, that's all that takes.
>>>
>>> Managed schema is enabled by default, but schemaless mode is the next
>>> layer on top. With managed schema, you can use the API to add your
>>> fields (or new Admin UI in the Schema screen). With schemaless mode,
>>> it tries to guess the field type as it adds it automatically.
>>>
>>>
>>> Regards,
>>> Alex.
>>>
>>> ----
>>> Newsletter and resources for Solr beginners and intermediates:
>>> http://www.solr-start.com/
>>>
>>>
>>> On 10 August 2016 at 18:04, Pierre Caserta <[email protected]> wrote:
>>>> Hi Alex,
>>>> thanks for your answer.
>>>>
>>>> Yes my solrconfig.xml contains the add-unknown-fields-to-the-schema.
>>>>
>>>> <initParams path="/update/**">
>>>> <lst name="defaults">
>>>> <str name="update.chain">add-unknown-fields-to-the-schema</str>
>>>> </lst>
>>>> </initParams>
>>>>
>>>> I created my core using this command:
>>>>
>>>> curl
>>>> http://192.168.99.100:8999/solr/admin/cores?action=CREATE&name=solrexchange&instanceDir=/opt/solr/server/solr/solrexchange&configSet=data_driven_schema_configs_custom
>>>>
>>>> I am using the example configset data_driven_schema_configs and I simply
>>>> added:
>>>>
>>>> <lib dir="${solr.install.dir:../../../..}/dist/"
>>>> regex="solr-dataimporthandler-.*\.jar" />
>>>> <requestHandler name="/dataimport" class="solr.DataImportHandler">
>>>> <lst name="defaults">
>>>> <str name="config">data-config.xml</str>
>>>> </lst>
>>>> </requestHandler>
>>>>
>>>> I thought the schemaless mode was enable by default but I also tried
>>>> adding this config but I get the same result.
>>>>
>>>> <schemaFactory class="ManagedIndexSchemaFactory">
>>>> <bool name="mutable">true</bool>
>>>> <str name="managedSchemaResourceName">managed-schema</str>
>>>> </schemaFactory>
>>>>
>>>> How can I update my schemaless URP chain and add the parameter to call it
>>>> to DIH?
>>>>
>>>>
>>>>> On 10 Aug 2016, at 17:43, Alexandre Rafalovitch <[email protected]>
>>>>> wrote:
>>>>>
>>>>> Do you have the actual fields defined? If not, then I am guessing that
>>>>> your 'post' test was against a different collection that had
>>>>> schemaless mode enabled and your DIH one is against one where
>>>>> schemaless mode is not enabled (look for
>>>>> 'add-unknown-fields-to-the-schema' in the solrconfig.xml to confirm).
>>>>> Solr examples for DIH do not have schemaless mode enabled.
>>>>>
>>>>> I _believe_ you can copy the schemaless URP chain and add the
>>>>> parameter to call it to DIH handler and it _should_ work. But I am not
>>>>> betting on it without testing it, as DIH also has some magic code to
>>>>> ignore fields not defined in schema because it is designed to work
>>>>> with only extracting relevant fields from the database even with
>>>>> 'select *' statement.
>>>>>
>>>>>
>>>>> Regards,
>>>>> Alex.
>>>>> ----
>>>>> Newsletter and resources for Solr beginners and intermediates:
>>>>> http://www.solr-start.com/
>>>>>
>>>>>
>>>>> On 10 August 2016 at 17:12, Pierre Caserta <[email protected]>
>>>>> wrote:
>>>>>> Hi,
>>>>>> It seems that using the DataImportHandler with a XPathEntityProcessor
>>>>>> config
>>>>>> with a managed-schema setup, only import the id and version field.
>>>>>>
>>>>>> data-config.xml
>>>>>>
>>>>>> <dataConfig>
>>>>>> <dataSource type="FileDataSource" encoding="UTF-8" />
>>>>>> <document>
>>>>>> <entity name="post"
>>>>>> processor="XPathEntityProcessor"
>>>>>> stream="true"
>>>>>> forEach="/posts/row/"
>>>>>> url="${dataimporter.request.dataurl}"
>>>>>>
>>>>>> transformer="RegexTransformer,DateFormatTransformer,HTMLStripTransformer"
>>>>>>>
>>>>>> <field column="id" xpath="/posts/row/@Id" />
>>>>>> <field column="postTypeId" xpath="/posts/row/@PostTypeId" />
>>>>>> <field column="acceptedAnswerId"
>>>>>> xpath="/posts/row/@AcceptedAnswerId" />
>>>>>> <field column="creationDate" xpath="/posts/row/@CreationDate"
>>>>>> dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss.SSS" />
>>>>>> <field column="postScore" xpath="/posts/row/@Score" />
>>>>>> <field column="viewCount" xpath="/posts/row/@ViewCount" />
>>>>>> <field column="body" xpath="/posts/row/@Body" stripHTML="true"
>>>>>> />
>>>>>> <field column="ownerUserId" xpath="/posts/row/@OwnerUserId" />
>>>>>> <field column="lastEditorUserId"
>>>>>> xpath="/posts/row/@LastEditorUserId" />
>>>>>> <field column="lastEditorDisplayName"
>>>>>> xpath="/posts/row/@LastEditorDisplayName" />
>>>>>> <field column="lastActivityDate"
>>>>>> xpath="/posts/row/@LastActivityDate"
>>>>>> dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss.SSS" />
>>>>>> <field column="title" xpath="/posts/row/@Title" />
>>>>>> <field column="trimmedTags" xpath="/posts/row/@Tags"
>>>>>> regex="<(.*)>" />
>>>>>> <field column="tags" sourceColName="trimmedTags"
>>>>>> splitBy="><" />
>>>>>> <field column="answerCount" xpath="/posts/row/@AnswerCount" />
>>>>>> <field column="commentCount" xpath="/posts/row/@CommentCount"
>>>>>> />
>>>>>> <field column="favoriteCount" xpath="/posts/row/@FavoriteCount"
>>>>>> />
>>>>>> <field column="communityOwnedDate"
>>>>>> xpath="/posts/row/@CommunityOwnedDate"
>>>>>> dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss.SSS" />
>>>>>> </entity>
>>>>>> </document>
>>>>>> </dataConfig>
>>>>>>
>>>>>>
>>>>>> http://192.168.99.100:8999/solr/solrexchange/select?indent=on&q=*:*&wt=json
>>>>>> {
>>>>>> "responseHeader":{
>>>>>> "status":0,
>>>>>> "QTime":0,
>>>>>> "params":{
>>>>>> "q":"*:*",
>>>>>> "indent":"on",
>>>>>> "wt":"json",
>>>>>> "_":"1470811193595"}},
>>>>>> "response":{"numFound":8,"start":0,"docs":[
>>>>>> {
>>>>>> "id":"38822",
>>>>>> "_version_":1542258196375142400},
>>>>>> {
>>>>>> "id":"38836",
>>>>>> "_version_":1542258196387725312},
>>>>>> {
>>>>>> "id":"63896",
>>>>>> "_version_":1542258196388773888},
>>>>>> {
>>>>>> "id":"65406",
>>>>>> "_version_":1542258196391919616},
>>>>>> {
>>>>>> "id":"1357173",
>>>>>> "_version_":1542258196391919617},
>>>>>> {
>>>>>> "id":"5339763",
>>>>>> "_version_":1542258196392968192},
>>>>>> {
>>>>>> "id":"9932722",
>>>>>> "_version_":1542258196392968193},
>>>>>> {
>>>>>> "id":"9217299",
>>>>>> "_version_":1542258196392968194}]
>>>>>> }}
>>>>>>
>>>>>> data_search.xml (8 rows)
>>>>>>
>>>>>>
>>>>>>
>>>>>> the url I am hitting (with custom dataurl parameter)
>>>>>>
>>>>>> curl
>>>>>> 'http://192.168.99.100:8999/solr/solrexchange/dataimport?command=full-import&commit=true&dataurl=/code/solr/data/search/dih/data_search.xml'
>>>>>>
>>>>>> I changed my data to use <add> <doc> <field> and use the bin/post tool
>>>>>> and
>>>>>> this is working as expected.
>>>>>> Now I am interested to make it work with the DataImportHandler.
>>>>>> How can I use the DataImportHandler to import my document ?
>>>>>>
>>>>>> Thanks,
>>>>>> Pierre Caserta
>>>>>>
>>>>>>
>>>>
>>