On 07/12/23 at 09:58 +0100, Andreas Tille wrote:
> Hi,
> 
> by chance I realised that the uploaders table contains some names where names
> are not stripped:
> 
> udd=> select '"' || u.name || '"' as name_with_spaces, uploader from 
> uploaders u where name like '% ' or name like ' %' ;
>      name_with_spaces     |                 uploader                  
> --------------------------+-------------------------------------------
>  " Mehdi Dogguy"          |  Mehdi Dogguy <[email protected]>
>  " David Paleino"         |  David Paleino <[email protected]>
>  " Stéphane Glondu"      |  Stéphane Glondu <[email protected]>
>  " Stefano Zacchiroli"    |  Stefano Zacchiroli <[email protected]>
>  " Stefano Zacchiroli"    |  Stefano Zacchiroli <[email protected]>
>  " Stefano Zacchiroli"    |  Stefano Zacchiroli <[email protected]>
>  " Stefano Zacchiroli"    |  Stefano Zacchiroli <[email protected]>
>  " Stefano Zacchiroli"    |  Stefano Zacchiroli <[email protected]>
>  "Andreas Tille  "        | Andreas Tille   <[email protected]>
>  " LI Daobing"            |  LI Daobing <[email protected]>
>  " David Paleino"         |  David Paleino <[email protected]>
>  " Stefano Zacchiroli"    |  Stefano Zacchiroli <[email protected]>
>  " Nikita V. Youshchenko" |  Nikita V. Youshchenko <[email protected]>
>  " Nikita V. Youshchenko" |  Nikita V. Youshchenko <[email protected]>
>  " Nikita V. Youshchenko" |  Nikita V. Youshchenko <[email protected]>
>  " Nikita V. Youshchenko" |  Nikita V. Youshchenko <[email protected]>
>  " Nikita V. Youshchenko" |  Nikita V. Youshchenko <[email protected]>
>  "Colin Tuckley "         | Colin Tuckley  <[email protected]>
>  "Colin Tuckley "         | Colin Tuckley  <[email protected]>
>  "Colin Tuckley "         | Colin Tuckley  <[email protected]>
> (20 rows)
> 
> 
> This causes slight errors when counting uploads of people.  My guess is this
> is due to some old importer code (I've checked the hit for my name which
> is a pretty old upload).  Thus I wonder whether it might be the easiest
> fix to simply fix this with some proper UPDATE statement to remove unneeded
> spaces.  This statement is doing the trick in my local clone:
> 
>    UPDATE uploaders SET name = trim(name), uploader = trim(name) || ' ' || 
> email WHERE name like ' %' or name like '% ' ;
> 
> If I'm not misleaded historic uploads will not importet from scratch so
> this would cure the situation.  Otherwise users need to always remember
> adding some trim(name) when dealing with the uploaders.name column not
> to mention that it gets even harder to deal with the uploader column
> that might feature extra spaces in the middle.
> 
> What do you think?

Hi,

Uploaders is refreshed every few hours from archive data, so a one-time
UPDATE would not help. UDD usually tries to preserve inaccuracies, so
those might be interesting for QA work.
In your case, why don't you use the email address to identify uploaders?
(possibly combining it with the carnivore data to identify different emails
belonging to the same person ?)

Lucas

Reply via email to