Re: Grep on dictionary words

2009-11-30 Thread Mike Castle
On Sat, Nov 28, 2009 at 7:13 AM, Dotan Cohen wrote: > I have a long binary file (about 12 MB) that I need to extract the > text from via "strings". Naturally, there are a lot of junk lines such > as these: > pDuf > #k0H}g) > GoV5 > rLeY1 > TMlq,* > > Is there a way to grep the output of strings in

Re: Grep on dictionary words

2009-11-30 Thread Andrew Sackville-West
On Sun, Nov 29, 2009 at 11:14:58AM +0200, Dotan Cohen wrote: > 2009/11/29 Andrew Sackville-West : > > On Sun, Nov 29, 2009 at 01:22:15AM +0200, Dotan Cohen wrote: > >> > will get the ones that start with capital alphas. if you want initial > >> > caps *only* then: > >> > > >> > grep "^[A-Z][a-z]*$"

Re: Grep on dictionary words

2009-11-29 Thread Emanoil Kotsev
Dotan Cohen wrote: > > This means that only words that start with a caps are valid. I need > "can start with a caps, but caps can be nowhere else". I got that like > this: > grep "^[A-Za-z][a-z]*$" > However I think that there is a better way. > > This is a good exercise. I am bettering my regex

Re: Grep on dictionary words

2009-11-29 Thread Dotan Cohen
2009/11/29 Andrew Sackville-West : > On Sun, Nov 29, 2009 at 01:22:15AM +0200, Dotan Cohen wrote: >> > will get the ones that start with capital alphas. if you want initial >> > caps *only* then: >> > >> > grep "^[A-Z][a-z]*$" >> > >> > would match those. >> > >> >> Thanks. I meant that caps could

Re: Grep on dictionary words

2009-11-29 Thread Tzafrir Cohen
On Sun, Nov 29, 2009 at 01:22:15AM +0200, Dotan Cohen wrote: > > will get the ones that start with capital alphas. if you want initial > > caps *only* then: > > > > grep "^[A-Z][a-z]*$" > > > > would match those. > > > > Thanks. I meant that caps could only be at the beginning of a word, > not in

Re: Grep on dictionary words

2009-11-28 Thread John Hasler
Dotan writes: > Is there a way to grep the output of strings in order to only show > lines that contain words found in the aspell dictionary? Try this: #!/bin/bash strings "$1" | while read line do if [ ` echo "$line" | sed -e 's/[^a-zA-Z ]//g' | wc -m` -lt 6 ] then continue fi echo "$line" | s

Re: Grep on dictionary words

2009-11-28 Thread Andrew Sackville-West
On Sun, Nov 29, 2009 at 01:22:15AM +0200, Dotan Cohen wrote: > > will get the ones that start with capital alphas. if you want initial > > caps *only* then: > > > > grep "^[A-Z][a-z]*$" > > > > would match those. > > > > Thanks. I meant that caps could only be at the beginning of a word, > not in

Re: Grep on dictionary words

2009-11-28 Thread Dotan Cohen
> will get the ones that start with capital alphas. if you want initial > caps *only* then: > > grep "^[A-Z][a-z]*$" > > would match those. > Thanks. I meant that caps could only be at the beginning of a word, not in the middle. Expanding your example, I figured that would be: grep "^[A-Z]?[a-z]*$

Re: Grep on dictionary words

2009-11-28 Thread Andrew Sackville-West
On Sun, Nov 29, 2009 at 12:00:33AM +0200, Dotan Cohen wrote: > > ISTM that because the output of strings is not discrete list of > > potential words, but is instead a long list of concatenated > > characters, this problem is really rather daunting. The output should > > probably be first broken up

Re: Grep on dictionary words

2009-11-28 Thread Florian Kriener
On Saturday 28 November 2009 16:13:55 Dotan Cohen wrote: > I have a long binary file (about 12 MB) that I need to extract the > text from via "strings". Naturally, there are a lot of junk lines > such as these: > pDuf > #k0H}g) > GoV5 > rLeY1 > TMlq,* > > Is there a way to grep the output of stri

Re: Grep on dictionary words

2009-11-28 Thread Dotan Cohen
> ISTM that because the output of strings is not discrete list of > potential words, but is instead a long list of concatenated > characters, this problem is really rather daunting. The output should > probably be first broken up into something resembling words by perhaps > breaking on non-alphabet

Re: Grep on dictionary words

2009-11-28 Thread Andrew Sackville-West
On Sat, Nov 28, 2009 at 11:32:59AM -0600, Boyd Stephen Smith Jr. wrote: > In <880dece00911280713n6193b8das6970e8a071fc2...@mail.gmail.com>, Dotan Cohen > wrote: > >Is there a way to grep the output of strings in order to only show > >lines that contain words found in the aspell dictionary? Thanks

Re: Grep on dictionary words

2009-11-28 Thread Boyd Stephen Smith Jr.
In <880dece00911280713n6193b8das6970e8a071fc2...@mail.gmail.com>, Dotan Cohen wrote: >Is there a way to grep the output of strings in order to only show >lines that contain words found in the aspell dictionary? Thanks in >advance. I once wrote a small program against the aspell API to do somethin

Grep on dictionary words

2009-11-28 Thread Dotan Cohen
I have a long binary file (about 12 MB) that I need to extract the text from via "strings". Naturally, there are a lot of junk lines such as these: pDuf #k0H}g) GoV5 rLeY1 TMlq,* Is there a way to grep the output of strings in order to only show lines that contain words found in the aspell diction