Re: [Bibdesk-users] v1.8.6: 'Searching External Databases' is limited to an arbitrary short number of results?

Jodi Schneider Wed, 06 Oct 2021 07:33:17 -0700

For systematic literature reviews (an important use case), I'd recommend
searching on the platform (e.g. PubMed, other database) rather than inside
a bibliographic manager (like BibDesk). It is also useful to save the
search (e.g. RIS formatted text file) for future use (e.g. to have a ground
truth to compare after deduplication etc.).


The issue of tracking, inside BibDesk, what has been saved (when paging
through a large query) and what has not, seems to me the bigger priority
here.

-Jodi

On Wed, Oct 6, 2021 at 6:57 AM mn <[email protected]> wrote:

> On 06.10.21 11:38, Christiaan Hofman wrote:
> >>>>>>> For efficiency, we don’t fetch all the results at once. If you show
> >>>>>>> the status bar, you may see the total number of available results.
> >>>>>>>
> >>>>>>> If you search repeatedly, further results will be fetched.
> >>>>>>
> >>>>>> OK.
> >>>>>> Is there some way to fetch all results at once?
> >>>>>> A setting or option to customize the numbers of results fetched?
> >>>>>> If not, adding those would be most welcome.
> >>>>>
> >>>>> No, there isn’t.
> >>>>
> >>>> BTW, this is not just our choice. It is also the policy for the server
> >>>> for the web interface. And they threaten to block your IP address when
> >>>> you don’t comply with their policy, so I don’t think it is a good idea
> >>>> to ignore that.
> >>>
> >>> On their website
> >>>
> https://www.ncbi.nlm.nih.gov/books/NBK25497/#chapter2.Usage_Guidelines_and_Requiremen
> >>> <
> https://www.ncbi.nlm.nih.gov/books/NBK25497/#chapter2.Usage_Guidelines_and_Requiremen
> >
> >>> they give the following:
> >>>
> >>> > In order not to overload the E-utility servers, NCBI recommends that
> >>> users post no more than three URL requests per second and limit large
> >>> jobs to either weekends or between 9:00 PM and 5:00 AM Eastern time
> >>> during weekdays. Failure to comply with this policy may result in an IP
> >>> address being blocked from accessing NCBI.
> >>>
> >>> The mechanics behind the scene here elude me: BibDesk fetches 50
> >>> _results_ in one go, apparently fine with the '≤3 URL requests /s'?
> >>>
> >>> On the face of it, I'd conclude that smaller portions (like 20 results
> >>> per 'search') would be fine in any case.
> >>> But then 150 results at a time as well?
> >>>
> >>> The week day angle seems quite vague, but open to interpretation that
> >>> larger requests on weekends will be possible/tolerated?
> >>>
> >>> But I suspect that 'URL requests' as limited on server and the 'BibDesk
> >>> results fetcheing' don not even correspond in that matter?
> >>>
> >>
> >> A comment in our code also mentions a limit on the number of results.
> >> Perhaps they have changed that policy over time. But getting a larger
> >> number of results can also slow down the search (a lot). Also, every
> >> fetch action is a URL action. The first one is two, because we first
> >> need to get the number of results.
> >>
> >
> > BTW, if you really get a very large number of results you probably did
> > not target your search very well, and fetching a large number of results
> > is really not that useful, just wasteful. Are you going to look for the
> > result you need in 15000 items?
> >
>
> The waste angle is in the eye of the beholder. I'll certainly not read
> line by line thru 79000 results. But I also find the result fetching in
> predefined chunks of 50 problematic.
>
> There are of course different usage scenarios for BibDesk.
>
> The goal here is to approach something in diemnsions like a systematic
> literature review with as much work done within BibDesk as is possible
> to get a minimized workflow.
>
> This is inspired by JabRef's functionality for this (but that is just my
> understanding of how it's advertised in the menus of that app, as it
> then always crashes on my system due to some Java bugs; plus I
> previously much preferred BibDesk for the work I needed done, which were
> admittedly much smaller projects that just grew larger over time).
>
> So I noticed that in BibDesk with results expected to be (somewhat)
> largish, it gets difficult for me to keep track over what's done and
> what remains tbd. Re-downloading the same results and analyzing them
> again is certainly also wasteful. How to solve that with more advanced
> queries, better targetting is unresolved until now.
>
> My thinking was that fetch results online, then work offline to search
> and sort within those downloaded…
>
> It may be a suboptimal approach after all, but with the advanced search
> on PubMed, I have a much easier time narrowing down the results via
> (better?) queries. My understanding is that within BibDesk, that is
> limited to spelling out the exact query directly (perhaps doable, but
> with GUI options it certainly gets that much easier to eg limit
> publication dates).
>
>  From that search results page I also just created a query that then
> allowed me to download 2712 results –with abstracts included– into one
> text file. Is that not reproducible within BibDesk's accessing Entrez?
>
>
>
> — Mike
>
>
>
>
>
> _______________________________________________
> Bibdesk-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/bibdesk-users
>

_______________________________________________
Bibdesk-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users

Re: [Bibdesk-users] v1.8.6: 'Searching External Databases' is limited to an arbitrary short number of results?

Reply via email to