Re: Duplicates

Peter Karich Fri, 23 Jul 2010 05:07:02 -0700

Pavel,

hopefully I understand now your usecase :-) but one question:


> I need to select always *one* file per folder or
> select *only* folders than contains matched files (without files).

What do you mean here with 'or'? Do you have 2 usecases or would one of them be 
sufficient?
Because the second usecase could be solved without the patch: you could index 
folders only, 
then all prop_N will be multivalued field. and you don't have the problem of 
duplicate folders.

(If you don't mind uglyness both usecases could even handled: After you got the 
folders 
 grabbing the files which matched could be done in postprocessing)

But I fear the cleanest solution is to use the patch. Hopefully it can be 
applied without hassles
against 1.4 or the trunk. If not, please ask on the patch-site for assistance.

Regards,
Peter.


> Thanks, Peter!
>
> I'll try collapsing today.
>
> Example (sorry if table unformated):
>
> id |  type  |   prop_1  | .... |  prop_N |  folderId
> ________________________________________
>  0 | folder |           |      |         |
>  1 | file   |  val1     |      |  valN1  |   0
>  2 | file   |  val3     |      |  valN2  |   0
>  3 | file   |  val1     |      |  valN3  |   0
>  4 | folder |           |      |         |
>  5 | folder |           |      |         |
>  6 | file   |  val3     |      |  valN7  |   6
>  7 | file   |  val4     |      |  valN8  |   6
>  8 | folder |           |      |         |
>  9 | file   |  val2     |      |  valN3  |   8
>  10| file   |  val1     |      |  valN2  |   8
>  11| file   |  val2     |      |  valN5  |   8
>  12| folder |           |      |         |
>
>
> I need to select always *one* file per folder or
> select *only* folders than contains matched files (without files).
>
> Query:
> prop_1:val1 OR prop_2:val2
>
> I need results (document ids):
> 1, 9
> or
> 0, 8
>
> 2010/7/23 Peter Karich <[email protected]>
>
>   
>> Hi Pavel!
>>
>> The patch can be applied to 1.4.
>> The performance is ok, but for some situations it could be worse than
>> without the patch.
>> For us it works good, but others reported some exceptions
>> (see the patch site: https://issues.apache.org/jira/browse/SOLR-236)
>>
>>     
>>> I need only to delete duplicates
>>>       
>> Could you give us an example what you exactly need?
>> (Maybe you could index each master document of the 'unique' documents
>> with an extra field and query for that field?)
>>
>> Regards,
>> Peter.
>>
>> --
>>     
> Pavel Minchenkov
>
>   


-- 
http://karussell.wordpress.com/

Re: Duplicates

Reply via email to