I am rebuilding a new docker image with each change on the config file so solr starts fresh every time.
<requestHandler name="/dataimport" initParams="myInitParams" class="solr.DataImportHandler"> <lst name="defaults"> <str name="update.chain">add-unknown-fields-to-the-schema</str> <str name="config">solr-data-config.xml</str> </lst> </requestHandler> still having document like such: "response":{"numFound":8,"start":0,"docs":[ { "id":"38822", "_version_":1542264667720646656}, { If add add the Body field using the Schema section of the Admin UI, This field is getting indexed during the dataimport. It seems that solr.DataImportHandler does not allow the add-unknown-fields-to-the-schema update.chain. Pierre > On 10 Aug 2016, at 18:33, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > > Ok, to reduce the magic, you can just stick "update.chain" parameter > inside the defaults of the dataimport handler directly. > > You can also pass it just as a URL parameter. That's what 'defaults' > section mean. > > And, just to be paranoid, you did reload the core after each of those > changes to test it? These are not picked up automatically. > > Regards, > Alex. > ---- > Newsletter and resources for Solr beginners and intermediates: > http://www.solr-start.com/ > > > On 10 August 2016 at 18:28, Pierre Caserta <pierre.case...@gmail.com> wrote: >> It did not work, >> I tried many things and ended up trying this: >> >> <requestHandler name="/dataimport" initParams="myInitParams" >> class="solr.DataImportHandler"> >> <lst name="defaults"> >> <str name="config">solr-data-config.xml</str> >> </lst> >> </requestHandler> >> <initParams name="myInitParams" path="/update/**,/dataimport"> >> <lst name="defaults"> >> <str name="update.chain">add-unknown-fields-to-the-schema</str> >> </lst> >> </initParams> >> >> Regards, >> Pierre >> >>> On 10 Aug 2016, at 18:08, Alexandre Rafalovitch <arafa...@gmail.com> wrote: >>> >>> Your initParams section does not apply to /dataimport handler as >>> defined. Try modifying it to say: >>> path="/update/**,/dataimport" >>> >>> Hopefully, that's all that takes. >>> >>> Managed schema is enabled by default, but schemaless mode is the next >>> layer on top. With managed schema, you can use the API to add your >>> fields (or new Admin UI in the Schema screen). With schemaless mode, >>> it tries to guess the field type as it adds it automatically. >>> >>> >>> Regards, >>> Alex. >>> >>> ---- >>> Newsletter and resources for Solr beginners and intermediates: >>> http://www.solr-start.com/ >>> >>> >>> On 10 August 2016 at 18:04, Pierre Caserta <pierre.case...@gmail.com> wrote: >>>> Hi Alex, >>>> thanks for your answer. >>>> >>>> Yes my solrconfig.xml contains the add-unknown-fields-to-the-schema. >>>> >>>> <initParams path="/update/**"> >>>> <lst name="defaults"> >>>> <str name="update.chain">add-unknown-fields-to-the-schema</str> >>>> </lst> >>>> </initParams> >>>> >>>> I created my core using this command: >>>> >>>> curl >>>> http://192.168.99.100:8999/solr/admin/cores?action=CREATE&name=solrexchange&instanceDir=/opt/solr/server/solr/solrexchange&configSet=data_driven_schema_configs_custom >>>> >>>> I am using the example configset data_driven_schema_configs and I simply >>>> added: >>>> >>>> <lib dir="${solr.install.dir:../../../..}/dist/" >>>> regex="solr-dataimporthandler-.*\.jar" /> >>>> <requestHandler name="/dataimport" class="solr.DataImportHandler"> >>>> <lst name="defaults"> >>>> <str name="config">data-config.xml</str> >>>> </lst> >>>> </requestHandler> >>>> >>>> I thought the schemaless mode was enable by default but I also tried >>>> adding this config but I get the same result. >>>> >>>> <schemaFactory class="ManagedIndexSchemaFactory"> >>>> <bool name="mutable">true</bool> >>>> <str name="managedSchemaResourceName">managed-schema</str> >>>> </schemaFactory> >>>> >>>> How can I update my schemaless URP chain and add the parameter to call it >>>> to DIH? >>>> >>>> >>>>> On 10 Aug 2016, at 17:43, Alexandre Rafalovitch <arafa...@gmail.com> >>>>> wrote: >>>>> >>>>> Do you have the actual fields defined? If not, then I am guessing that >>>>> your 'post' test was against a different collection that had >>>>> schemaless mode enabled and your DIH one is against one where >>>>> schemaless mode is not enabled (look for >>>>> 'add-unknown-fields-to-the-schema' in the solrconfig.xml to confirm). >>>>> Solr examples for DIH do not have schemaless mode enabled. >>>>> >>>>> I _believe_ you can copy the schemaless URP chain and add the >>>>> parameter to call it to DIH handler and it _should_ work. But I am not >>>>> betting on it without testing it, as DIH also has some magic code to >>>>> ignore fields not defined in schema because it is designed to work >>>>> with only extracting relevant fields from the database even with >>>>> 'select *' statement. >>>>> >>>>> >>>>> Regards, >>>>> Alex. >>>>> ---- >>>>> Newsletter and resources for Solr beginners and intermediates: >>>>> http://www.solr-start.com/ >>>>> >>>>> >>>>> On 10 August 2016 at 17:12, Pierre Caserta <pierre.case...@gmail.com> >>>>> wrote: >>>>>> Hi, >>>>>> It seems that using the DataImportHandler with a XPathEntityProcessor >>>>>> config >>>>>> with a managed-schema setup, only import the id and version field. >>>>>> >>>>>> data-config.xml >>>>>> >>>>>> <dataConfig> >>>>>> <dataSource type="FileDataSource" encoding="UTF-8" /> >>>>>> <document> >>>>>> <entity name="post" >>>>>> processor="XPathEntityProcessor" >>>>>> stream="true" >>>>>> forEach="/posts/row/" >>>>>> url="${dataimporter.request.dataurl}" >>>>>> >>>>>> transformer="RegexTransformer,DateFormatTransformer,HTMLStripTransformer" >>>>>>> >>>>>> <field column="id" xpath="/posts/row/@Id" /> >>>>>> <field column="postTypeId" xpath="/posts/row/@PostTypeId" /> >>>>>> <field column="acceptedAnswerId" >>>>>> xpath="/posts/row/@AcceptedAnswerId" /> >>>>>> <field column="creationDate" xpath="/posts/row/@CreationDate" >>>>>> dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss.SSS" /> >>>>>> <field column="postScore" xpath="/posts/row/@Score" /> >>>>>> <field column="viewCount" xpath="/posts/row/@ViewCount" /> >>>>>> <field column="body" xpath="/posts/row/@Body" stripHTML="true" >>>>>> /> >>>>>> <field column="ownerUserId" xpath="/posts/row/@OwnerUserId" /> >>>>>> <field column="lastEditorUserId" >>>>>> xpath="/posts/row/@LastEditorUserId" /> >>>>>> <field column="lastEditorDisplayName" >>>>>> xpath="/posts/row/@LastEditorDisplayName" /> >>>>>> <field column="lastActivityDate" >>>>>> xpath="/posts/row/@LastActivityDate" >>>>>> dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss.SSS" /> >>>>>> <field column="title" xpath="/posts/row/@Title" /> >>>>>> <field column="trimmedTags" xpath="/posts/row/@Tags" >>>>>> regex="<(.*)>" /> >>>>>> <field column="tags" sourceColName="trimmedTags" >>>>>> splitBy="><" /> >>>>>> <field column="answerCount" xpath="/posts/row/@AnswerCount" /> >>>>>> <field column="commentCount" xpath="/posts/row/@CommentCount" >>>>>> /> >>>>>> <field column="favoriteCount" xpath="/posts/row/@FavoriteCount" >>>>>> /> >>>>>> <field column="communityOwnedDate" >>>>>> xpath="/posts/row/@CommunityOwnedDate" >>>>>> dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss.SSS" /> >>>>>> </entity> >>>>>> </document> >>>>>> </dataConfig> >>>>>> >>>>>> >>>>>> http://192.168.99.100:8999/solr/solrexchange/select?indent=on&q=*:*&wt=json >>>>>> { >>>>>> "responseHeader":{ >>>>>> "status":0, >>>>>> "QTime":0, >>>>>> "params":{ >>>>>> "q":"*:*", >>>>>> "indent":"on", >>>>>> "wt":"json", >>>>>> "_":"1470811193595"}}, >>>>>> "response":{"numFound":8,"start":0,"docs":[ >>>>>> { >>>>>> "id":"38822", >>>>>> "_version_":1542258196375142400}, >>>>>> { >>>>>> "id":"38836", >>>>>> "_version_":1542258196387725312}, >>>>>> { >>>>>> "id":"63896", >>>>>> "_version_":1542258196388773888}, >>>>>> { >>>>>> "id":"65406", >>>>>> "_version_":1542258196391919616}, >>>>>> { >>>>>> "id":"1357173", >>>>>> "_version_":1542258196391919617}, >>>>>> { >>>>>> "id":"5339763", >>>>>> "_version_":1542258196392968192}, >>>>>> { >>>>>> "id":"9932722", >>>>>> "_version_":1542258196392968193}, >>>>>> { >>>>>> "id":"9217299", >>>>>> "_version_":1542258196392968194}] >>>>>> }} >>>>>> >>>>>> data_search.xml (8 rows) >>>>>> >>>>>> >>>>>> >>>>>> the url I am hitting (with custom dataurl parameter) >>>>>> >>>>>> curl >>>>>> 'http://192.168.99.100:8999/solr/solrexchange/dataimport?command=full-import&commit=true&dataurl=/code/solr/data/search/dih/data_search.xml' >>>>>> >>>>>> I changed my data to use <add> <doc> <field> and use the bin/post tool >>>>>> and >>>>>> this is working as expected. >>>>>> Now I am interested to make it work with the DataImportHandler. >>>>>> How can I use the DataImportHandler to import my document ? >>>>>> >>>>>> Thanks, >>>>>> Pierre Caserta >>>>>> >>>>>> >>>> >>