Hi All, Greetings for the day.
I am using solr5.3 and trying to upload wikipedia page article dump <https://dumps.wikimedia.org/enwiki/20150805/enwiki-20150805-pages-articles1.xml-p000000010p000010000.bz2> to solr using "DataImportHandler" but I am getting only id and title files when i am querying. Below is my data-config.xml <dataConfig> <dataSource type="FileDataSource" encoding="UTF-8" /> <document> <entity name="page" processor="XPathEntityProcessor" stream="true" forEach="/mediawiki/page/" url="/mnt/TEST/enwiki-20150602-pages-articles1.xml" transformer="RegexTransformer,DateFormatTransformer" > <field column="id" xpath="/mediawiki/page/id" /> <field column="title" xpath="/mediawiki/page/title" /> <field column="revision" xpath="/mediawiki/page/revision/id" /> <field column="user" xpath="/mediawiki/page/revision/contributor/username" /> <field column="userId" xpath="/mediawiki/page/revision/contributor/id" /> <field column="text" xpath="/mediawiki/page/revision/text" /> <field column="timestamp" xpath="/mediawiki/page/revision/timestamp" dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'" /> <field column="$skipDoc" regex="^#REDIRECT .*" replaceWith="true" sourceColName="text"/> </entity> </document></dataConfig> Also I have added below entires to schema.xml. <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> <field name="title" type="string" indexed="true" stored="false"/> <field name="revision" type="int" indexed="true" stored="true"/> <field name="user" type="string" indexed="true" stored="true"/> <field name="userId" type="int" indexed="true" stored="true"/> <field name="text" type="text_en" indexed="true" stored="false"/> <field name="timestamp" type="date" indexed="true" stored="true"/> <field name="titleText" type="text_en" indexed="true" stored="true"/> I have copied schema.xml from "example/example-DIH/solr/solr/conf/schema.xml" and removed all field entries with few exceptions as mentioned in comments. After importing data I am just trying to fetch all fields but I am getting only "Id" and "Title". Also I tried to run documentImport using debug mode so that I can get some information regarding indexing, but at whenever i am selecting debug mode it is only importing 2 documents. I am not sure why? Due to this reason I am not able to debug the indexing process. Please guide me further. EDIT-I am now sure that other fields are not getting indexed because when I am specifying df=user or text, I am getting below message. "msg": "undefined field user", I am querying like below: *http://localhost:8983/solr/wiki/select?q= <http://localhost:8983/solr/wiki/select?q=>%3A&fl=id%2Ctitle%2Ctext%2Crevision&wt=json&indent=true&debugQuery=true* -- Regards Gaurav Pant +91-7709196607 -- Regards Gaurav Pant +91-7709196607