Re: [discovery] Etiquette: Comsumption of API wikipedia from backend (full-text search)

Luigi Assom Wed, 23 Dec 2015 09:17:43 -0800

Hello David!

Your project seems to be very interesting, could you elaborate a bit more?

So much thank you!
I will definitely be happy to elaborate more on it via a skype call: I
could share the screen and show what I m boiling in the pot :D

Back to your reply now:

Yes, I was mainly testing during time both Europe and USA are connected.
However, I am experiencing this type of delay from my laptop; maybe on
deployment will speed up cause is my home network creepy?

I am concerned because I need to first fetch results from Wikipedia, then
elaborate with my own data (that is fast enough <200ms) and then push it to
the client. That is the reason of why I will put it server side and not
client-side.

I need search generator only as *first entry point*: imagine you need to
search for a topic, but you don't know exactly what. Imagine an input form,
you type in some keywords, select one among results, and then you start
your session.

I cannot estimate exactly the amount of FST query I need; let's say each
user will need a search generator only once per session.

Maybe 30 user per seconds concurrent would be a good reference (it 's same
number Parse of Facebook provide, Firebase up to 100... so maybe I could
relay on similar order of magnitude...)

If I can provide people with a smooth user experience on search, that will
be interesting because I could free resources up : I may extend a test of
knowledge discovery to other languages, too.
If the first user experience was too slow (~1.3s + bandwith transmission
~1.5+ per query) that could become critical.

I don't need search generator to operate in batch, or to track changes.
It just serve the user to find a topic as entry point for discovery.
I cannot use 'Opensearch' because it does not provide _IDs ; also, it
searches against titles only.

Would it be possible to reserve somehow bandwith or requests for a domain?

On Wed, Dec 23, 2015 at 3:55 PM, David Causse <[email protected]> wrote:

> Le 22/12/2015 18:28, Luigi Assom a écrit :
>
>> I tested it from my laptop, and I found it quite slow; as example, it
>> took:
>>
>> ~1.2 seconds for querying 'DNA'
>>
>> ~1.6 s for 'terroristi attacks'
>>
>> ~1.7s for 'biology technology'
>>
>>
> For a single word query on english wikipedia this is more like 400ms for
> me, so I'm not sure to understand why you experienced such response times.
> Response times may vary depending on server load but I'm surprised you
> noticed more than 1 sec for simple queries like that.
> Did you check that you are receiving the result type/format you expect
> (i.e. format=json ) ?
> Could you re-check at different times of the day, servers may be busy
> around 8pm CET (time when both europe and america are active).
>
> Your project seems to be very interesting, could you elaborate a bit more?
> Do you plan to use the api from a backend/automata which will need to send
> a lot of queries, do you have an estimation on your needs (number of
> queries and refresh rate)?
> If your process is like refreshing a set of queries regularly I'd suggest
> you build a daemon that send few queries (3 or 4) per minute rather than an
> aggressive batch with parallel processes run once a day/week/month.
> You should have a look at RCStream[1] which may be more appropriate to
> your needs (if you plan to track changes it's definitely better than
> refreshing the same set of queries regularly)
>
> Thank you!
>
> [1] https://wikitech.wikimedia.org/wiki/RCStream
>
> _______________________________________________
> discovery mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/discovery
>

_______________________________________________
discovery mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/discovery

Re: [discovery] Etiquette: Comsumption of API wikipedia from backend (full-text search)

Reply via email to