Hello, I am writing about limits of use and etiquette to comply with for consuming API for full-text search *server side*.
I am building a site for visualization and knowledge discovery of wikipedias. It will be a personal funded project (at least initially!), for public use: investing more in indexing under Elastic Search would be beyond my possibilities and also beyond the scope of my project - focus is on visualization and discovery. And I also think there is no need to reinvent the wheel :) I want to figure out a best setup for usability and rate requests for of full-text search API, complying with your policy. Would you please take a minute to read below? *** Currently my set up makes use of my own db: for full text search I use elastic search at a very basic level. I then use Wikipedia API for decoration of my data, *client-side (AJAX).* Despite slower than what I have now, Wikipedia full-text api are much more useful for a user. It offer results on complex queries that I cannot provide, for I am indexing only articles' titles. I would like to include full-text search against WikiMedia API from server side. I want to ensure that I can meet policy of wikimedia foundation, if I will make concurrent requests on behalf of users. - *Are there any limit to the number of request I can do from a web domain?* I would like to use wikitool python library. The query I need to run will use a *search generator *over article namespace only: action=query&*generator=search*&gsrnamespace=0&gsrsearch='my query'& gsrlimit=20 I tested it from my laptop, and I found it quite slow; as example, it took: ~1.2 seconds for querying 'DNA' ~1.6 s for 'terroristi attacks' ~1.7s for 'biology technology' and I am currently on a very fast wifi network. - *How would it be possible to improve performance? * - *Is it possible to apply for a desired rate of requests?* I also read it would be a good etiquette practice to specify in *headers* contacts, in case you need to communicate with the domain. It is not clear to me what I should do. - *Could you please indicate how to do it with an example in python (here using flask framework)?* Thank you very much for your help, Luigi
_______________________________________________ discovery mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/discovery
