El dilluns, 19 de març de 2018, a les 17:30:38 CEST, suzuki toshiya va escriure: > Hi, > > Recently I heard some people wants to retrieve the list > of words from PDF, as cpp's poppler::page::text_list(), > but with the font information (e.g. the familyname of > the font). > > Considering that often the office document or academic > articles use different fonts for the section titles and > the main text, it would be reasonable for the people to > expect as "I want to retrieve the text boxes, but only > the text boxes written by Helvetica-Bold". > > What is the right way to do such? During the developmet > of poppler::page::text_list(), once I've tried to do such. > https://github.com/mpsuzuki/poppler/commit/8ce2556a62a90c034d7cea8b1dfd26715 > d03a8f0 (note: this patch was written before the stabilization > of unique_ptr utilization. more fix is expected in future) > > However, I feel it's slightly too big. Its changes are > not only for cpp frontend codes, but also for poppler/FontInfo.{cc,h} > and poppler/TextOutputDev.{cc,h}. I want to ask a few > questions... > > Q-1) a request for text_box with font info fits to poppler's > scope? is there any better library to request such feature?
We already have it in TextOutputDev, so sure, why not. > > Q-2) if this request fits to poppler's scope, the enhancement > of the cpp frontend poppler::page::text_list() is the way to > go? having different API for such purpose is better? Well, API is exactly the problem here, what do you plan to expose, only a string? I've seen you've added font_size and wmode too. Is that enough? Also you really need some documentation, if i have a look at that class and see int get_wmode(int i = 0) const; Without any kind of documentation, i wouldn't know what to do with that function. > Q-3) my current patch modifies FontInfo and TextOutputDev > of libpoppler itself. such modification is acceptable? If you don't create bugs or make it slower for different use cases, sure why wouldn't such modifications be acceptable? Cheers, Albert > > I appreciate if the maintainers can give some comments. > > Regards, > mpsuzuki > > _______________________________________________ > poppler mailing list > [email protected] > https://lists.freedesktop.org/mailman/listinfo/poppler _______________________________________________ poppler mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/poppler
