Re: DataImportHandler not indexing all the records

Shalin Shekhar Mangar Sat, 15 Nov 2008 11:21:48 -0800

I think the problem is that DIH catches Exception but not Error so a
StackOverFlowError will slip past it. Normally, the SolrDispatchFilter will
log such errors but the import is performed in a new thread, so the error is
not logged anywhere. However, DIH will not commit documents in this case
(and there is no mention of a commit in your DIH status).


We should change the catch clause to catch Throwable so that this is not
repeated. I'll open an issue and give a patch.

Btw, Ahmed, Solr has a Tokenizer which is much better at striping html --
HTMLStripWhitespaceTokenizerFactory which you can use for such tasks.

On Sun, Nov 16, 2008 at 12:30 AM, Ahmed Hammad <[EMAIL PROTECTED]> wrote:

> I had a similar problem like Giri. I have 17,000 record in one table and
> DIH
> can import only 12464.
>
> After some investigation, I found my problem.
>
> I have a regular expression to strip off html tags form input text, as
> following:
>
> <field sourceColName="content" column="content" regex="&lt;(.|\n)*?&gt;"
> replaceWith=" "/>
>
> The DIH RegEx have stack overflow on the record 17,000 due to error in the
> content and then DIH exit without any error in the log on in the status
> command. Here is the status:
>
> <lst name="statusMessages">
> <str name="Time Elapsed">0:0:31.657</str>
> <str name="Total Requests made to DataSource">1</str>
> <str name="Total Rows Fetched">12464</str>
> <str name="Total Documents Processed">12464</str>
> <str name="Total Documents Skipped">0</str>
> <str name="Full Dump Started">2008-11-15 20:40:58</str>
> </lst>
>
> I found the error in Eclipse Console window while debugging; it was a stack
> overflow in the RegEx library.
>
> The problem is that, DIH does not show any problem in log file on in status
> message.
> What I think is important is to show whatever error happen in the log file.
>
> I noticed also that, in case of no error a log message show completness:
>
> Nov 15, 2008 8:57:34 PM org.apache.solr.handler.dataimport.DocBuilder
> execute
> INFO: Time taken = 0:0:40.656
>
> In case of RegEx stack overflow error, this log message does not appear.
>
> I am researching on how to catch such error in DIH. Any ideas?
>
>
> Regards,
> ahmd
>
> On Sat, Nov 15, 2008 at 6:32 AM, Noble Paul നോബിള്‍ नोब्ळ् <
> [EMAIL PROTECTED]> wrote:
>
> > There is no obvious problem
> >
> > I can be reasonably sure that
> > the query
> >
> > select * from climatedata.ws_record limit 1000000
> >
> > would have fetched only  615360 rows.
> > This is a very reliable pice of information
> > <str name="Total Rows Fetched">615360</str>
> >
> > On Sat, Nov 15, 2008 at 12:41 AM, Giri <[EMAIL PROTECTED]> wrote:
> > > Hi Noble,
> > > thanks for the help, here are the details: the field "id" is unique,
> when
> > I
> > > did a select distinct(id), it returned 1 million rows.
> > >
> > > -------------------------------------------------------------------
> > > db-data-config.xml
> > > note: I limit the resultset to 1 million in the select query
> > > -------------------------------------------------------------------
> > > <dataConfig>
> > >    <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
> > > url="jdbc:mysql://localhost:3306/climatedata" user="user" password="pw"
> > > batchSize ="-1"/>
> > >    <document name="climateRecord">
> > >        <entity name="observation" query="select * from
> > > climatedata.ws_record limit 1000000">
> > >            <field column="id" name="id" />
> > >            <field column="inst_code" name="inst_code" />
> > >            <field column="inst_name" name="inst_name" />
> > >            <field column="meas_name" name="meas_name" />
> > >            <field column="latitude" name="latitude" />
> > >            <field column="longitude" name="longitude" />
> > >            <field column="ob_id" name="ob_id" />
> > >            <field column="in_id" name="in_id" />
> > >            <field column="ob_name" name="ob_name" />
> > >         </entity>
> > >    </document>
> > > </dataConfig>
> > >
> > > -----------------------------------------------------------------
> > > in the solr Schema.xml:
> > > ----------------------------------------------------------------
> > > <fields>
> > >       <field name="id" type="string" indexed="true" stored="true"
> > > multiValued="false"/>
> > >    <field name="inst_code" type="text" indexed="true" stored="true"
> > > multiValued="true" required="false"/>
> > >    <field name="inst_name" type="text" indexed="true" stored="true"
> > > multiValued="true" required="false"/>
> > >    <field name="meas_name" type="text" indexed="true" stored="true"
> > > multiValued="true" required="false"/>
> > >        <field name="latitude" type="sfloat" class="solr.FloatField"
> > > indexed="true" stored="true"  required="false"/>
> > >    <field name="longitude" type="sfloat" class="solr.FloatField"
> > > indexed="true" stored="true"  required="false"/>
> > >    <field name="ob_id" type="string" indexed="true" stored="true"
> > > multiValued="true"/>
> > >    <field name="in_id" type="string" indexed="true" stored="true"
> > > multiValued="true"/>
> > >    <field name="ob_name" type="text" indexed="true" stored="true"
> > > multiValued="true"/>
> > >
> > >   <!-- catchall field, containing all other searchable text fields
> > > (implemented
> > >        via copyField further on in this schema  -->
> > >   <field name="text" type="text" indexed="true" stored="false"
> > > multiValued="true" required="false"/>
> > >
> > >   <!-- non-tokenized version of manufacturer to make it easier to sort
> or
> > > group
> > >        results by manufacturer.  copied from "manu" via copyField -->
> > >   <field name="manu_exact" type="string" indexed="true" stored="false"
> > > required="false"/>
> > >
> > >
> > >   <!-- Dynamic field definitions.  If a field name is not found,
> > > dynamicFields
> > >        will be used if the name matches any of the patterns.
> > >        RESTRICTION: the glob-like pattern in the name attribute must
> have
> > >        a "*" only at the start or the end.
> > >        EXAMPLE:  name="*_i" will match any field ending in _i (like
> > myid_i,
> > > z_i)
> > >        Longer patterns will be matched first.  if equal size patterns
> > >        both match, the first appearing in the schema will be used.  -->
> > >   <dynamicField name="*_i"  type="sint"    indexed="true"
> >  stored="true"/>
> > >   <dynamicField name="*_s"  type="string"  indexed="true"
> >  stored="true"/>
> > >   <dynamicField name="*_l"  type="slong"   indexed="true"
> >  stored="true"/>
> > >   <dynamicField name="*_t"  type="text"    indexed="true"
> >  stored="true"/>
> > >   <dynamicField name="*_b"  type="boolean" indexed="true"
> >  stored="true"/>
> > >   <dynamicField name="*_f"  type="sfloat"  indexed="true"
> >  stored="true"/>
> > >   <dynamicField name="*_d"  type="sdouble" indexed="true"
> >  stored="true"/>
> > >   <dynamicField name="*_dt" type="date"    indexed="true"
> >  stored="true"/>
> > >  </fields>
> > >
> > > ----------------------------------------------------
> > > I run the index via  firefox browser using
> > > http://localhost:8080/solr/dataimport?command=full-import
> > > I checked the status using
> > > http://localhost:8080/solr/dataimport?command=status
> > > initially the status increased steadily, but after reaching 613071, the
> > > status stayed for a while (as below), and then it displayed the
> completed
> > > message :
> > > ----------------------------------------------------
> > > <response>
> > > -
> > > <lst name="responseHeader">
> > > <int name="status">0</int>
> > > <int name="QTime">1</int>
> > > </lst>
> > > -
> > > <lst name="initArgs">
> > > -
> > > <lst name="defaults">
> > > <str name="config">db-data-config.xml</str>
> > > </lst>
> > > </lst>
> > > <str name="command">status</str>
> > > <str name="status">busy</str>
> > > <str name="importResponse">A command is still running...</str>
> > > -
> > > <lst name="statusMessages">
> > > <str name="Time Elapsed">0:3:24.266</str>
> > > <str name="Total Requests made to DataSource">1</str>
> > > <str name="Total Rows Fetched">613071</str>
> > > <str name="Total Documents Processed">613070</str>
> > > <str name="Total Documents Skipped">0</str>
> > > <str name="Full Dump Started">2008-11-14 12:12:16</str>
> > > </lst>
> > > -
> > > <str name="WARNING">
> > > This response format is experimental.  It is likely to change in the
> > future.
> > > </str>
> > > </response>
> > >
> > > -----------------------------------------------------------
> > >
> > >>>NOTE: this is the status result after it completed
> > > -----------------------------------------------------------
> > >
> > > <response>
> > > -
> > > <lst name="responseHeader">
> > > <int name="status">0</int>
> > > <int name="QTime">1</int>
> > > </lst>
> > > -
> > > <lst name="initArgs">
> > > -
> > > <lst name="defaults">
> > > <str name="config">db-data-config.xml</str>
> > > </lst>
> > > </lst>
> > > <str name="command">status</str>
> > > <str name="status">idle</str>
> > > <str name="importResponse"/>
> > > -
> > > <lst name="statusMessages">
> > > <str name="Total Requests made to DataSource">1</str>
> > > <str name="Total Rows Fetched">615360</str>
> > > <str name="Total Documents Skipped">0</str>
> > > <str name="Full Dump Started">2008-11-14 12:12:16</str>
> > > -
> > > <str name="">
> > > Indexing completed. Added/Updated: 615360 documents. Deleted 0
> documents.
> > > </str>
> > > <str name="Committed">2008-11-14 12:16:32</str>
> > > <str name="Optimized">2008-11-14 12:16:32</str>
> > > <str name="Time taken ">0:4:16.154</str>
> > > </lst>
> > > -
> > > <str name="WARNING">
> > > This response format is experimental.  It is likely to change in the
> > future.
> > > </str>
> > > </response>
> > >
> > > -----------------------------------------------------
> > >
> > > here is the full solr scehma.xml content:
> > > ----------------------------------------------------
> > > <?xml version="1.0" ?>
> > > <!-- The Solr schema file. This file should be named "schema.xml" and
> > >  should be in the conf directory under the solr home
> > >  (i.e. ./solr/conf/schema.xml by default)
> > >  or located where the classloader for the Solr webapp can find it.
> > >
> > >  For more information, on how to customize this file, please see...
> > >  http://wiki.apache.org/solr/SchemaXml
> > > -->
> > >
> > > <schema name="example" version="1.1">
> > >  <types>
> > >    <!-- field type definitions. The "name" attribute is
> > >         just a label to be used by field definitions.  The "class"
> > >         attribute and any other attributes determine the real
> > >         behavior of the fieldtype.  -->
> > >
> > >    <!-- The StringField type is not analyzed, but indexed/stored
> verbatim
> > > -->
> > >    <fieldtype name="string" class="solr.StrField"
> > sortMissingLast="true"/>
> > >
> > >    <!-- boolean type: "true" or "false" -->
> > >    <fieldtype name="boolean" class="solr.BoolField"
> > > sortMissingLast="true"/>
> > >
> > >    <!-- The optional sortMissingLast and sortMissingFirst attributes
> are
> > >         currently supported on types that are sorted internally as a
> > > strings.
> > >       - If sortMissingLast="true" then a sort on this field will cause
> > > documents
> > >       without the field to come after documents with the field,
> > >       regardless of the requested sort order (asc or desc).
> > >       - If sortMissingFirst="true" then a sort on this field will cause
> > > documents
> > >       without the field to come before documents with the field,
> > >       regardless of the requested sort order.
> > >       - If sortMissingLast="false" and sortMissingFirst="false" (the
> > > default),
> > >       then default lucene sorting will be used which places docs
> without
> > > the field
> > >       first in an ascending sort and last in a descending sort.
> > >    -->
> > >
> > >    <!-- numeric field types that store and index the text
> > >         value verbatim (and hence don't support range queries since the
> > >         lexicographic ordering isn't equal to the numeric ordering) -->
> > >    <fieldtype name="integer" class="solr.IntField"/>
> > >    <fieldtype name="long" class="solr.LongField"/>
> > >    <fieldtype name="float" class="solr.FloatField"/>
> > >    <fieldtype name="double" class="solr.DoubleField"/>
> > >
> > >
> > >    <!-- Numeric field types that manipulate the value into
> > >         a string value that isn't human readable in it's internal form,
> > >         but with a lexicographic ordering the same as the numeric
> > ordering
> > >         so that range queries correctly work. -->
> > >    <fieldtype name="sint" class="solr.SortableIntField"
> > > sortMissingLast="true"/>
> > >    <fieldtype name="slong" class="solr.SortableLongField"
> > > sortMissingLast="true"/>
> > >    <fieldtype name="sfloat" class="solr.SortableFloatField"
> > > sortMissingLast="true"/>
> > >    <fieldtype name="sdouble" class="solr.SortableDoubleField"
> > > sortMissingLast="true"/>
> > >
> > >
> > >    <!-- The format for this date field is of the form
> > 1995-12-31T23:59:59Z,
> > > and
> > >         is a more restricted form of the canonical representation of
> > > dateTime
> > >         http://www.w3.org/TR/xmlschema-2/#dateTime
> > >         The trailing "Z" designates UTC time and is mandatory.
> > >         Optional fractional seconds are allowed:
> 1995-12-31T23:59:59.999Z
> > >         All other components are mandatory. -->
> > >    <fieldtype name="date" class="solr.DateField"
> sortMissingLast="true"/>
> > >
> > >    <!-- solr.TextField allows the specification of custom text
> analyzers
> > >         specified as a tokenizer and a list of token filters. Different
> > >         analyzers may be specified for indexing and querying.
> > >
> > >         The optional positionIncrementGap puts space between multiple
> > > fields of
> > >         this type on the same document, with the purpose of preventing
> > > false phrase
> > >         matching across fields.
> > >
> > >         For more info on customizing your analyzer chain, please see...
> > >      http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
> > >
> > >     -->
> > >
> > >     <!-- Standard analyzer commonly used by Lucene developers
> > >     -->
> > >    <!-- Standard analyzer commonly used by Lucene developers -->
> > >    <fieldtype name="text_lu" class="solr.TextField"
> > > positionIncrementGap="100">
> > >      <analyzer>
> > >        <tokenizer class="solr.StandardTokenizerFactory"/>
> > >        <filter class="solr.StandardFilterFactory"/>
> > >        <filter class="solr.LowerCaseFilterFactory"/>
> > >        <filter class="solr.StopFilterFactory"/>
> > >        <filter class="solr.EnglishPorterFilterFactory"/>
> > >      </analyzer>
> > >    </fieldtype>
> > >    <!-- One could also specify an existing Analyzer implementation in
> > Java
> > >         via the class attribute on the analyzer element:
> > >    <fieldtype name="text_lu" class="solr.TextField">
> > >      <analyzer
> > > class="org.apache.lucene.analysis.snowball.SnowballAnalyzer"/>
> > >    </fieldType>
> > >    -->
> > >
> > >    <!-- A text field that only splits on whitespace for more exact
> > matching
> > > -->
> > >    <fieldtype name="text_ws" class="solr.TextField"
> > > positionIncrementGap="100">
> > >      <analyzer>
> > >        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > >      </analyzer>
> > >    </fieldtype>
> > >
> > >    <!-- A text field that uses WordDelimiterFilter to enable splitting
> > and
> > > matching of
> > >        words on case-change, alpha numeric boundaries, and
> > non-alphanumeric
> > > chars
> > >        so that a query of "wifi" or "wi fi" could match a document
> > > containing "Wi-Fi".
> > >        Synonyms and stopwords are customized by external files, and
> > > stemming is enabled -->
> > >    <fieldtype name="text" class="solr.TextField"
> > > positionIncrementGap="100">
> > >      <analyzer type="index">
> > >          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > >          <!-- in this example, we will only use synonyms at query time
> > >          <filter class="solr.SynonymFilterFactory"
> > > synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
> > >          -->
> > >          <!--<filter class="solr.WordDelimiterFilterFactory"
> > > generateWordParts="1"/>-->
> > >          <filter class="solr.StopFilterFactory" ignoreCase="true"/>
> > >          <filter class="solr.LowerCaseFilterFactory"/>
> > >      </analyzer>
> > >      <analyzer type="query">
> > >          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > >          <filter class="solr.StopFilterFactory" ignoreCase="true"/>
> > >          <filter class="solr.LowerCaseFilterFactory"/>
> > >      </analyzer>
> > >    </fieldtype>
> > >
> > >    <!-- Less flexible matching, but less false matches.  Probably not
> > ideal
> > > for product names
> > >         but may be good for SKUs.  Can insert dashes in the wrong place
> > and
> > > still match. -->
> > >    <fieldtype name="textTight" class="solr.TextField"
> > > positionIncrementGap="100" >
> > >      <analyzer>
> > >        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > >        <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt"
> > > ignoreCase="true" expand="false"/>
> > >        <filter class="solr.StopFilterFactory" ignoreCase="true"/>
> > >        <filter class="solr.WordDelimiterFilterFactory"
> > > generateWordParts="0" generateNumberParts="0" catenateWords="1"
> > > catenateNumbers="1" catenateAll="0"/>
> > >        <filter class="solr.LowerCaseFilterFactory"/>
> > >        <filter class="solr.EnglishPorterFilterFactory"
> > > protected="protwords.txt"/>
> > >      </analyzer>
> > >    </fieldtype>
> > >  </types>
> > >  <fields>
> > >   <!-- Valid attributes for fields:
> > >       name: mandatory - the name for the field
> > >       type: mandatory - the name of a previously defined type from the
> > > <types> section
> > >       indexed: true if this field should be indexed (searchable)
> > >       stored: true if this field should be retrievable
> > >       multiValued: true if this field may contain multiple values per
> > > document
> > >       omitNorms: (expert) set to true to omit the norms associated with
> > > this field
> > >                  (this disables length normalization and index-time
> > > boosting for the field)
> > >   -->
> > >    <field name="id" type="string" indexed="true" stored="true"
> > > multiValued="false"/>
> > >    <field name="inst_code" type="text" indexed="true" stored="true"
> > > multiValued="true" required="false"/>
> > >    <field name="inst_name" type="text" indexed="true" stored="true"
> > > multiValued="true" required="false"/>
> > >    <field name="meas_name" type="text" indexed="true" stored="true"
> > > multiValued="true" required="false"/>
> > >        <field name="latitude" type="sfloat" class="solr.FloatField"
> > > indexed="true" stored="true"  required="false"/>
> > >    <field name="longitude" type="sfloat" class="solr.FloatField"
> > > indexed="true" stored="true"  required="false"/>
> > >    <field name="ob_id" type="string" indexed="true" stored="true"
> > > multiValued="true"/>
> > >    <field name="in_id" type="string" indexed="true" stored="true"
> > > multiValued="true"/>
> > >    <field name="ob_name" type="text" indexed="true" stored="true"
> > > multiValued="true"/>
> > >
> > >   <!-- catchall field, containing all other searchable text fields
> > > (implemented
> > >        via copyField further on in this schema  -->
> > >   <field name="text" type="text" indexed="true" stored="false"
> > > multiValued="true" required="false"/>
> > >
> > >
> > >   <!-- non-tokenized version of manufacturer to make it easier to sort
> or
> > > group
> > >        results by manufacturer.  copied from "manu" via copyField -->
> > >   <field name="manu_exact" type="string" indexed="true" stored="false"
> > > required="false"/>
> > >
> > >
> > >   <!-- Dynamic field definitions.  If a field name is not found,
> > > dynamicFields
> > >        will be used if the name matches any of the patterns.
> > >        RESTRICTION: the glob-like pattern in the name attribute must
> have
> > >        a "*" only at the start or the end.
> > >        EXAMPLE:  name="*_i" will match any field ending in _i (like
> > myid_i,
> > > z_i)
> > >        Longer patterns will be matched first.  if equal size patterns
> > >        both match, the first appearing in the schema will be used.  -->
> > >   <dynamicField name="*_i"  type="sint"    indexed="true"
> >  stored="true"/>
> > >   <dynamicField name="*_s"  type="string"  indexed="true"
> >  stored="true"/>
> > >   <dynamicField name="*_l"  type="slong"   indexed="true"
> >  stored="true"/>
> > >   <dynamicField name="*_t"  type="text"    indexed="true"
> >  stored="true"/>
> > >   <dynamicField name="*_b"  type="boolean" indexed="true"
> >  stored="true"/>
> > >   <dynamicField name="*_f"  type="sfloat"  indexed="true"
> >  stored="true"/>
> > >   <dynamicField name="*_d"  type="sdouble" indexed="true"
> >  stored="true"/>
> > >   <dynamicField name="*_dt" type="date"    indexed="true"
> >  stored="true"/>
> > >  </fields>
> > >
> > >  <!-- field to use to determine and enforce document uniqueness. -->
> > >  <uniqueKey>id</uniqueKey>
> > >
> > >  <!-- field for the QueryParser to use when an explicit fieldname is
> > absent
> > > -->
> > >  <defaultSearchField>text</defaultSearchField>
> > >
> > >  <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
> > >  <solrQueryParser defaultOperator="AND"/>
> > >
> > >  <!-- copyField commands copy one field to another at the time a
> document
> > >        is added to the index.  It's used either to index the same field
> > > different
> > >        ways, or to add multiple fields to the same field for
> > easier/faster
> > > searching.  -->
> > >
> > >
> > >
> > >  <!-- Similarity is the scoring routine for each document vs a query.
> > >      A custom similarity may be specified here, but the default is fine
> > >      for most applications.  -->
> > >  <!-- <similarity class="org.apache.lucene.search.DefaultSimilarity"/>
> > -->
> > >
> > > </schema>
> > >
> >
> -------------------------------------------------------------------------------------------------------------------------------------------------------------
> > >
> > >
> > > On Wed, Nov 12, 2008 at 11:01 PM, Noble Paul നോബിള്‍ नोब्ळ् <
> > > [EMAIL PROTECTED]> wrote:
> > >
> > >> the fact that it got committed in the end suggests there was no error
> in
> > >> between
> > >>
> > >> look at the status url and see the no:of rows returned etc.
> > >>
> > >> It gives a clue as to what would have really happened. or you can
> > >> paste your dataconfig and status xmls and we may be able to suggest
> > >> something
> > >>
> > >> On Thu, Nov 13, 2008 at 9:26 AM, Giri <[EMAIL PROTECTED]> wrote:
> > >> > Hi Noble,
> > >> >
> > >> > thanks for reply, my comments are below
> > >> >
> > >> >>>why is the id field multivalued?
> > >> > I was just trying various options, yes, this ID is unique, and I
> check
> > >> for
> > >> > duplicates, when I did a distinct (id) query to the MySQL database,
> it
> > >> > returned almost 2 million.
> > >> >
> > >> >>> look at the status host:post/dataimport gives you the status
> > >> > I constantly checked the status  using the  dataimport URL,  the
> > status
> > >> was
> > >> > increased upto 600K records, then it stopped increasing, then took
> few
> > >> > minutes to commit the indexed data.
> > >> >
> > >> >
> > >> > On Tue, Nov 11, 2008 at 11:35 PM, Noble Paul നോബിള്‍ नोब्ळ् <
> > >> > [EMAIL PROTECTED]> wrote:
> > >> >
> > >> >> why is the id field multivalued? is there a uniqueKey in the schema
> ?
> > >> >> Are you sure there are no duplicates?
> > >> >>
> > >> >> look at the status host:post/dataimport gives you the status
> > >> >> it can give you some clue
> > >> >>
> > >> >> --Noble
> > >> >>
> > >> >>
> > >> >> On Wed, Nov 12, 2008 at 4:53 AM, Giri <[EMAIL PROTECTED]>
> wrote:
> > >> >> > Hi,
> > >> >> >
> > >> >> > I have about ~ 2 million records in a mySQL database table (about
> 9
> > >> >> fields
> > >> >> > from a single table), and I am trying to load it to the solr
> using
> > >> >> > DataImportHandler using the command=full-import option. it only
> > >> indexed
> > >> >> > about 615360 records out of 2 millions.
> > >> >> >
> > >> >> > here is my db-data-config.xml
> > >> >> > <dataConfig>
> > >> >> >    <dataSource type="JdbcDataSource"
> driver="com.mysql.jdbc.Driver"
> > >> >> > url="jdbc:mysql://localhost:3306/mydb" user="ua" password="pw"
> > >> batchSize
> > >> >> > ="-1"/>
> > >> >> >    <document name="climate">
> > >> >> >        <entity name="occurence" query="select * from
> mylargetable">
> > >> >> >            <field column="id" name="id" />
> > >> >> >            <field column="title" name="title" />
> > >> >> >            <field column="url" name="url" />
> > >> >> >         </entity>
> > >> >> >    </document>
> > >> >> > </dataConfig>
> > >> >> >
> > >> >> > and in my solr schema.xml, i define these fields as:
> > >> >> >
> > >> >> >    <field name="id" type="string" indexed="true" stored="true"
> > >> >> > multiValued="true"/>
> > >> >> >    <field name="title" type="text" indexed="true" stored="true"
> > >> >> > multiValued="true" required="false"/>
> > >> >> >    <field name="url" type="text" indexed="true" stored="true"
> > >> >> > multiValued="true" required="false"/>
> > >> >> >
> > >> >> >
> > >> >> > If I try to index just one field (id), then it indexes about
> 960000
> > >> >> records,
> > >> >> > but if I try to index all the above three fields, it indexes only
> > >> 615360
> > >> >> > records.
> > >> >> >
> > >> >> > Any help will be appreciated.
> > >> >> >
> > >> >> > thanks!
> > >> >> >
> > >> >>
> > >> >>
> > >> >>
> > >> >> --
> > >> >> --Noble Paul
> > >> >>
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> --Noble Paul
> > >>
> > >
> >
> >
> >
> > --
> > --Noble Paul
> >
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: DataImportHandler not indexing all the records

Reply via email to