Thanks Ariel. One small thing, where exactly can I report it upstream? got
a url?

Michael


On Tue, May 21, 2013 at 5:45 PM, Ariel T. Glenn <[email protected]> wrote:

> If you can stomach it I would report it upstream, linking to the earlier
> version of the bug they had with a proposed patch etc.  I can give them
> a test file consisting of the one page with all its revisions, "only"
> 170 mb uncompressed :-D
>
> It's fine to open a report locally too in mwdumper and link the upstream
> report.
>
> Thanks,
>
> Ariel
>
> Στις 21-05-2013, ημέρα Τρι, και ώρα 15:57 +0200, ο/η Michael Tsikerdekis
> έγραψε:
> > Update on the matter. I've edited pom.xml and changed xerces version
> which
> > was set to 2.7.1 to 2.9.1, 2.11.0, 2.8.0 and other versions.
> >
> > The out of bound error becomes different on later versions but still the
> > error persists.
> > Also, I tried to use mwdumper with an older version of wikipedia dump:
> > 20130102.
> >
> > The error still appears on the first file this
> > time: enwiki-20130102-pages-meta-history1.xml-p000000010p000002070.7z
> >
> > Should I report a new bug on bugzilla for mwdumper?
> >
> > Michael
> >
> >
> > On Mon, May 20, 2013 at 4:49 PM, Michael Tsikerdekis
> > <[email protected]>wrote:
> >
> > > great! at least we know what's causing it. I've seen the thread about
> > > xerces before but it was too old so I thought there is probably no
> relation.
> > >
> > > Let me know when there is a new build to try out or anything else I
> can do
> > > to help fix the problem.
> > >
> > > Michael
> > >
> > >
> > > On Mon, May 20, 2013 at 4:41 PM, Ariel T. Glenn <[email protected]
> >wrote:
> > >
> > >> Στις 20-05-2013, ημέρα Δευ, και ώρα 13:18 +0200, ο/η Michael
> Tsikerdekis
> > >> έγραψε:
> > >>
> > >> > 33 pages (0.593/sec), 25,374 revs (455.695/sec)
> > >> > Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException:
> > >> 2048
> > >> >         at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
> > >> >         at org.apache.xerces.impl.XMLEntityScanner.load(Unknown
> Source)
> > >> >         at
> org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown
> > >> > Source)
> > >> >         at
> > >> >
> > >>
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown
> > >> > Source)
> > >> >         at
> > >> >
> > >>
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
> > >> > Source)
> > >> >         at
> > >> >
> > >>
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
> > >> > Source)
> > >> >         at
> org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> > >> > Source)
> > >> >         at
> org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> > >> > Source)
> > >> >         at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
> > >> >         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> > >> Source)
> > >> >         at
> > >> org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown
> > >> > Source)
> > >> ...
> > >>
> > >> The file itself is fine; proof of that is that I isolated the
> > >> problematic page, removed the first revision (which had been processed
> > >> without problems) and then all remaining revisions including the 'bad'
> > >> one were handled properly.
> > >>
> > >> This is most likely a regression:
> > >> http://www.gossamer-threads.com/lists/wiki/mediawiki/128069
> > >> Our spec says to build against maven's xerces version 2.7.1, and I
> > >> expect that never got the patch [1].  I'm not sure what version of the
> > >> xerces library is good ([2]).
> > >>
> > >> I'm adding Chad back on the cc though since he'll have to update the
> > >> build specs.  Chad, do you want a bugzilla report for this?
> > >>
> > >> Ariel
> > >>
> > >> [1] http://www.gossamer-threads.com/lists/wiki/mediawiki/128069
> > >> [2]
> > >>
> > >>
> https://issues.apache.org/jira/browse/XERCESJ-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697205#action_12697205
> > >>
> > >>
> > >>
> > >>
> > >> _______________________________________________
> > >> MediaWiki-l mailing list
> > >> [email protected]
> > >> https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
> > >>
> > >
> > >
> > _______________________________________________
> > MediaWiki-l mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
>
>
>
> _______________________________________________
> MediaWiki-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
>
_______________________________________________
MediaWiki-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l

Reply via email to