Hi all, I am doing a faceted search on a solr field that contains URLs, for the sole purpose of trying to locate duplicate URLs in my documents.
However, the solr response I get looks like this: public 'com' => int 492198 public 'flickr' => int 492198 public 'http' => int 492198 public 'www' => int 253881 public 'photo' => int 253843 public 'n' => int 253318 public 'httpwwwflickrcomphoto' => int 253316 public 'farm' => int 238317 public 'httpfarm' => int 238317 public 'jpg' => int 238317 public 'static' => int 238317 public 'staticflickrcom' => int 238317 public '5' => int 237939 public '00' => int 61009 public 'b' => int 59463 public 'c' => int 59094 public 'f' => int 59004 public 'd' => int 58995 public 'e' => int 58818 public 'a' => int 58327 public '08' => int 33797 public '06' => int 33341 public '04' => int 29902 public '02' => int 29224 public '2' => int 26671 public '4' => int 26613 public '6' => int 26606 public '03' => int 26506 public '1' => int 26389 public '8' => int 26384 It should instead have the entire URL as the variable name, but the name is only a part of the URL. Is this because characters like :// in http:// cannot be used in variable names? If so, is there any workaround to the problem or an alternative way to detect duplicates? Thanks Christos