Spend some time in the admin/analysis page, that'll show you what part of the analysis chain is doing what to your data. It'll save you a world of headache...
But at a guess WordDelimiterFilterFactory is your culprit... Best Erick On Thu, Aug 4, 2011 at 6:08 PM, anand sridhar <anand.for...@gmail.com> wrote: > Ok. After analysis, I narrowed the reduced results set to the fact that the > zipcode field is not indexed 'as is'. i.e the zipcode field values are > broken down into tokens and then stored. Hence, if there are 10 documents > with zipcode fields varying from 91000-91009, then the zipcode fields are > not stored as 91000, 91001 etc.. instead, the most common recurrences are > grabbed together and stored as tokens hence resulting in a reduced > resultset. > The net effect is I cannot search for a value like 91000 since its not > stored as it is. > > I suspect this to do something with the type of field the zipcode is > associated to. Right now , zipcode is a field of type text_general where the > StandardTokenizerFactory may be breakign the values into tokens. However, I > want to store them without tokenizing. Whats the best field type to do this. > ? > > I already explored the String fieldtype which is supposed to store the > values as is, but I see that the values are still being tokenized. > > > Thanks, > Anand > On Wed, Aug 3, 2011 at 7:24 PM, Erick Erickson <erickerick...@gmail.com>wrote: > >> Sorry, I'm on a restricted machine so can't get the precise URL. But, >> there's a debug page for DIH that might allow you to see what the query >> actually returns. I'd guess one of two things: >> 1> you aren't getting the number of rows you think. >> 2> you aren't committing the documents you add. >> >> But that's just a guess. >> >> Best >> Erick >> On Aug 3, 2011 2:15 PM, "anand sridhar" <anand.for...@gmail.com> wrote: >> > Hi, >> > I am a newbie to Solr and have been trying to learn using >> > DataImportHandler. >> > I have a query in data-config.xml that fetches about 5 records when i >> fire >> > it in SQL Query manager. >> > However, when Solr does a full import, it is skipping 4 records and only >> > importing 1 record. >> > What could be the reason for that. ? >> > >> > My data-config.xml looks like this - >> > >> > <dataConfig> >> > <dataSource type="JdbcDataSource" >> > name="GeoService" >> > driver="net.sourceforge.jtds.jdbc.Driver" >> > url="jdbc:jtds:sqlserver://10.168.50.104/ZipCodeLookup" >> > user="sa" >> > password="psiuser"/> >> > <document> >> > <entity name="city" >> > query="select ll.cityId as id, ll.zip as zipCode, c.cityName as >> > cityName, st.stateName as state, ct.countryName as country from >> latlonginfo >> > ll,city c, state st, country ct where ll.cityId = c.cityID and >> > c.stateID=st.stateID and st.countryID = ct.countryID >> > order by ll.areacode" >> > dataSource="GeoService"> >> > <field column="zipCode" name="zipCode"/> >> > <field column="cityName" name="cityName"/> >> > <field column="state" name="state"/> >> > <field column="country" name="country"/> >> > </entity> >> > </document> >> > </dataConfig> >> > >> > My fields definition in schema.xml looks as below - >> > >> > <field name="CityName" type="text_general" indexed="true" stored="true" >> /> >> > <field name="zipCode" type="text_general" indexed="true" stored="true"/> >> > <field name="state" type="text_general" indexed="true" stored="true" /> >> > <field name="country" type="text_general" indexed="true" stored="true" /> >> > >> > One observation I made was the 1 record that is being indexes is the last >> > record in the result set. I have verified that there are no duplicate >> > records being retreived. >> > >> > For eg, if the result set from Database is - >> > >> > zipcode CityName state country >> > ------- --------- ----- ------- >> > 91324 Northridge CA USA >> > 91325 Northridge CA USA >> > 91327 Northridge CA USA >> > 91328 Northridge CA USA >> > 91329 Northridge CA USA >> > 91330 Northridge CA USA >> > >> > The record being indexed is the last record all the time. >> > >> > Any suggestions are welcome. >> > >> > Thanks, >> > Anand >> >