: Update: It seems I get the bad behavior (no documents returned) when the : length of a value in the StrField is greater than or equal to 32,767 : (2^15). Is this some type of bit overflow somewhere?
IIRC there is a limit in the lower level lucene code to how many bytes a single term can be -- but i don't remember off the top of my head where that's enforced. : > However, if I query 'someFieldName_2:*' and someFieldName_2 has values : > with length ~60k, I don't get back any documents. Even though I *know* that : > many documents have a value in someFieldName_2. frist off: don't do a query like that. you are asking for a prefix query using an empty prefix -- that's *hugely* inefficient. if your goal is to find all docs that have some value indexed in the field, then add a "has_someFieldName_2" boolean field and query for has_someFieldName_2:true, or if you really can't change your index use someFieldName_2:[* TO *] (if the only thing you are querying on is wether that field hsa some values, then you can make someFieldName_2 stored but not indexed and save a *ton* of space in your index) that said: i'm also suprised by your description of the problem -- specifically that having *any* terms over that length causes a prefix query like this to not match any docs at all. I would have expected you do get some errors for the large terms when indexing, and then at query time it would only match the docs with the shorter values. What i'm seeing is that the long terms are silently ignored, but the prefix query across the field will still match docs with shorter terms. i'll open a bug to figure out why we aren't generating an error for this at index time, but the behavior at query time looks correct.... hossman@frisbee:~$ perl -le 'print "a,aaa"; print "z," . ("Z" x 32767);' | curl 'http://localhost:8983/solr/update?header=false&fieldnames=name,long_s&rowid=id&commit=true' -H 'Content-Type: application/csv' --data-binary @- <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"><int name="status">0</int><int name="QTime">572</int></lst> </response> hossman@frisbee:~$ curl 'http://localhost:8983/solr/select?q=*:*&fl=id,name&wt=json&indent=true'{ "responseHeader":{ "status":0, "QTime":12, "params":{ "fl":"id,name", "indent":"true", "q":"*:*", "wt":"json"}}, "response":{"numFound":2,"start":0,"docs":[ { "name":"a", "id":"0"}, { "name":"z", "id":"1"}] }} hossman@frisbee:~$ curl 'http://localhost:8983/solr/select?q=long_s:*&wt=json&indent=true' { "responseHeader":{ "status":0, "QTime":4, "params":{ "indent":"true", "q":"long_s:*", "wt":"json"}}, "response":{"numFound":1,"start":0,"docs":[ { "name":"a", "long_s":"aaa", "id":"0", "_version_":1459225819107819520}] }} -Hoss http://www.lucidworks.com/