I have written and attached an example that compares bufio.Reader and
bufio.Scanner.
Here's the output from `go run .` (a line count followed by the first error
encountered):
```
Reader 1333665 <nil>
Scanner 777758 bufio.Scanner: token too long
```
This probably _won't_ fail on your 2M line file; it looks like the problem
is with the line length of a Debian Packages file. If you have a
Debian-derived distro you could try replacing the filename in the file with
one from `/var/lib/apt/lists/`.
The docs for bufio.Scanner do say
"Programs that need more control over error handling or large tokens, or
must run sequential scans on a reader, should use bufio.Reader instead"
Perhaps it would be more helpful to mention what the token length limit is?
On Thursday, October 12, 2023 at 9:45:10 AM UTC+1 Rob Pike wrote:
> I just did a simple test with a 2M line file and it worked fine, so I
> suspect it's a bug in your code. But if not, please provide a complete
> working executable example, with data, to help identify the problem.
>
> -rob
>
>
> On Thu, Oct 12, 2023 at 7:39 PM 'Mark' via golang-nuts <
> [email protected]> wrote:
>
>> I'm reading Debian *Package files, some of which are over 1M lines long.
>> I used bufio.Scanner and found that it won't read past 1M lines (I'm
>> using Go 1.21.1 linux/amd64).
>> Is this a limitation of bufio.Scanner? If so then it ought to be in the
>> docs.
>> Or is it a bug?
>> Or maybe I made a mistake (although using bufio.Scanner seems easy)?
>> ```
>> scanner := bufio.NewScanner(file)
>> lino := 1
>> for scanner.Scan() {
>> line := scanner.Text()
>> lino++
>> ... // etc
>> }
>> ```
>> Anyway, I've switched to using bufio.Reader and that works great.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "golang-nuts" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/golang-nuts/69f2fa03-c650-4c02-9470-51894dc56d1an%40googlegroups.com
>>
>> <https://groups.google.com/d/msgid/golang-nuts/69f2fa03-c650-4c02-9470-51894dc56d1an%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
--
You received this message because you are subscribed to the Google Groups
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/golang-nuts/ebf242ab-c8aa-4de8-821b-3abe77a9da86n%40googlegroups.com.
package main
import (
"bufio"
"fmt"
"io"
"os"
)
const pkgFile = "/var/lib/apt/lists/gb.archive.ubuntu.com_ubuntu_" +
"dists_jammy_universe_binary-amd64_Packages"
func main() {
lines, err := readPackages(pkgFile)
fmt.Println("Reader", lines, err)
lines, err = scanPackages(pkgFile)
fmt.Println("Scanner", lines, err)
}
func readPackages(filename string) (int, error) {
file, err := os.Open(filename)
if err != nil {
return 0, err
}
defer file.Close()
reader := bufio.NewReader(file)
lines := 0
for {
_, err := reader.ReadString('\n')
if err == io.EOF {
break
} else if err != nil {
return 0, err
}
lines++
}
return lines, nil
}
func scanPackages(filename string) (int, error) {
file, err := os.Open(filename)
if err != nil {
return 0, err
}
defer file.Close()
scanner := bufio.NewScanner(file)
lines := 0
for scanner.Scan() {
_ = scanner.Text()
lines++
}
return lines, scanner.Err()
}