: Update: It seems I get the bad behavior (no documents returned) when the
: length of a value in the StrField is greater than or equal to 32,767
: (2^15). Is this some type of bit overflow somewhere?

IIRC there is a limit in the lower level lucene code to how many bytes a 
single term can be -- but i don't remember off the top of my head where 
that's enforced.

: > However, if I query 'someFieldName_2:*' and someFieldName_2 has values
: > with length ~60k, I don't get back any documents. Even though I *know* that
: > many documents have a value in someFieldName_2.

frist off: don't do a query like that.  you are asking for a prefix query 
using an empty prefix -- that's *hugely* inefficient.  if your goal is to 
find all docs that have some value indexed in the field, then add a 
"has_someFieldName_2" boolean field and query for 
has_someFieldName_2:true, or if you really can't change your index use 
someFieldName_2:[* TO *]

(if the only thing you are querying on is wether that field hsa some 
values, then you can make someFieldName_2 stored but not indexed and save 
a *ton* of space in your index)


that said: i'm also suprised by your description of the problem -- 
specifically that having *any* terms over that length causes a prefix 
query like this to not match any docs at all.  I would have expected you 
do get some errors for the large terms when indexing, and then at query 
time it would only match the docs with the shorter values.

What i'm seeing is that the long terms are silently ignored, but the 
prefix query across the field will still match docs with shorter terms.

i'll open a bug to figure out why we aren't generating an error for this 
at index time, but the behavior at query time looks correct....

hossman@frisbee:~$ perl -le 'print "a,aaa"; print "z," . ("Z" x 32767);' | 
curl 
'http://localhost:8983/solr/update?header=false&fieldnames=name,long_s&rowid=id&commit=true'
 
-H 'Content-Type: application/csv' --data-binary @- 

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int 
name="QTime">572</int></lst>
</response>

hossman@frisbee:~$ curl 
'http://localhost:8983/solr/select?q=*:*&fl=id,name&wt=json&indent=true'{
  "responseHeader":{
    "status":0,
    "QTime":12,
    "params":{
      "fl":"id,name",
      "indent":"true",
      "q":"*:*",
      "wt":"json"}},
  "response":{"numFound":2,"start":0,"docs":[
      {
        "name":"a",
        "id":"0"},
      {
        "name":"z",
        "id":"1"}]
  }}


hossman@frisbee:~$ curl 
'http://localhost:8983/solr/select?q=long_s:*&wt=json&indent=true'
{
  "responseHeader":{
    "status":0,
    "QTime":4,
    "params":{
      "indent":"true",
      "q":"long_s:*",
      "wt":"json"}},
  "response":{"numFound":1,"start":0,"docs":[
      {
        "name":"a",
        "long_s":"aaa",
        "id":"0",
        "_version_":1459225819107819520}]
  }}






-Hoss
http://www.lucidworks.com/

Reply via email to