Oh, sorry... this way of course: Dim sStr As String = "abc. def!!! ghi? jkl: (mno)" Dim sWords As String[]
sWords = Split(sStr, " .!?:()", "", True) ''Expand as you will. For ii = 0 To sWords.Max Print sWords[ii] Next Jussi On Sun, Jun 18, 2017 at 6:29 AM, Jussi Lahtinen <jussi.lahti...@gmail.com> wrote: > It's not problem. > > Dim sStr As String = "abc. def!!! ghi? jkl: (mno)" > Dim sWords As String[] > > sWords = Split(sStr, " .!?:()") '' Exapand as you will. > > ii = 0 > Do > If sWords[ii] = "" Then > sWords.Remove(ii) > Else > Inc ii > Endif > Loop Until ii > sWords.Max > > For ii = 0 To sWords.Max > Print sWords[ii] > Next > > > Jussi > > > On Sun, Jun 18, 2017 at 4:53 AM, Fernando Cabral < > fernandojosecab...@gmail.com> wrote: > >> Jussi, what you suggest will not work. You have presumed the only >> separator is a single space. >> This is not the case. Between any two words you can have any non-alpha >> character in any number. >> It could be, for instance, "abc. def!!! ghi? jkl: (mno)" and so >> forth. >> This means, the definition of word is "any sequence of alphabetic >> characters followed by any sequence of non-alphabetic. >> >> That's why your suggestion does not apply. >> >> - fernando >> >> 2017-06-17 21:21 GMT-03:00 Jussi Lahtinen <jussi.lahti...@gmail.com>: >> >>> I think I would do something like: >>> >>> Dim ii As Integer >>> Dim sStr As String = "abc defg hijkl" >>> Dim sWords As String[] >>> >>> sWords = Split(sStr, " ") >>> >>> For ii = 0 To 2 >>> Print sWords[ii] >>> Next >>> >>> >>> >>> >>> Jussi >>> >>> On Sun, Jun 18, 2017 at 2:57 AM, Fernando Cabral < >>> fernandojosecab...@gmail.com> wrote: >>> >>>> Tobi >>>> >>>> One more thing about the way I wish it could work (I remember having >>>> done >>>> this in C perhaps 30 years ago). The pseudo-code bellow is pretty >>>> schematic, but I think it will clarify the issue. >>>> >>>> Let p and l be arrays of integers and s be the string "abc defg hijkl" >>>> >>>> So, after traversing the string we would have the following result: >>>> p[0] = offset of "a" (0) >>>> l[0] = length of "abc" (3) >>>> p[1] = offset of "d" (4) >>>> l[1] = lenght of "defg" (4) >>>> p[2] = offset of "h" (9) >>>> l[2] = lenght of "hijkl" (5). >>>> >>>> After this, each word could be retrieved in the following manner: >>>> >>>> for i = 0 to 2 >>>> print mid(s, p[i], l[i]) >>>> next >>>> >>>> I think this would be the most efficient way to do it. But I can't find >>>> how >>>> to do it in Gambas using Regex. >>>> >>>> Regards >>>> >>>> - fernando >>>> >>>> >>>> 2017-06-17 18:06 GMT-03:00 Tobias Boege <tabo...@gmail.com>: >>>> >>>> > On Sat, 17 Jun 2017, Fernando Cabral wrote: >>>> > > Still beating my head against the wall due to my lack of knowledge >>>> about >>>> > > the PCRE methods and properties... Because of this, I have >>>> progressed not >>>> > > only very slowly but also -- I fell -- in a very inelegant way. So >>>> > perhaps >>>> > > you guys who are more acquainted with PCRE might be able to hint me >>>> on a >>>> > > better solution. >>>> > > >>>> > > I want to search a long string that can contain a sentence, a >>>> paragraph >>>> > or >>>> > > even a full text. I wanna find and isolate every word it contains. >>>> A word >>>> > > is defined as any sequence of alphabetic characters followed by a >>>> > > non-alphatetic character. >>>> > > >>>> > >>>> > The Mathematician in me can't resist to point this out: you hopefully >>>> > wanted >>>> > to define "word in a string" as "a *longest* sequence of alphabetic >>>> > characters >>>> > followed by a non-alphabetic character (or the end of the string)". >>>> Using >>>> > your >>>> > definition above, the words in "abc:" would be "c", "bc" and "abc", >>>> whereas >>>> > you probably only wanted "abc" (the longest of those). >>>> > >>>> > > The sample code bellow does work, but I don't feel it is as elegant >>>> and >>>> > as >>>> > > fast as it could and should be. Especially the way I am traversing >>>> the >>>> > > string from the beginning to the end. It looks awkward and slow. >>>> There >>>> > must >>>> > > be a more efficient way, like working only with offsets and lengths >>>> > instead >>>> > > of copying the string again and again. >>>> > > >>>> > >>>> > You think worse of String.Mid() than it deserves, IMHO. Gambas strings >>>> > are triples of a pointer to some data, a start index and a length, and >>>> > the built-in string functions take care not to copy a string when it's >>>> > not necessary. The plain Mid$() function (dealing with ASCII strings >>>> only) >>>> > is implemented as a constant-time operation which simply takes your >>>> input >>>> > string and adjusts the start index and length to give you the >>>> requested >>>> > portion of the string. The string doesn't even have to be read, much >>>> less >>>> > copied, to do this. >>>> > >>>> > Now, the String.Mid() function is somewhat more complicated, because >>>> > UTF-8 strings have variable-width characters, which makes it difficult >>>> > to map byte indices to character positions. To implement String.Mid(), >>>> > your string has to be read, but, again, not copied. >>>> > >>>> > Extracting a part of a string is a non-destructive operation in Gambas >>>> > and no copying takes place. (Concatenating strings, on the other hand, >>>> > will copy.) So, there is some reading overhead (if you need UTF-8 >>>> strings), >>>> > but it's smaller than you probably thought. >>>> > >>>> > > Dim Alphabetics as string "abc...zyzABC...ZYZ" >>>> > > Dim re as RegExp >>>> > > Dim matches as String [] >>>> > > Dim RawText as String >>>> > > >>>> > > re.Compile("([" & Alphabetics & "]+?)([^" & Alphabetics & "]+)", >>>> > > RegExp.utf8) >>>> > > RawText = "abc12345def ghi jklm mno p1" >>>> > > >>>> > > Do While RawText >>>> > > re.Exec(RawText) >>>> > > matches.add(re[1].text) >>>> > > RawText = String.Mid(RawText, String.Len(re.text) + 1) >>>> > > Loop >>>> > > >>>> > > For i = 0 To matches.Count - 1 >>>> > > Print matches[i] >>>> > > Next >>>> > > >>>> > > >>>> > > Above code correctly finds "abc, def, ghi, jlkm, mno, p". But the >>>> tricks >>>> > I >>>> > > have used are cumbersome (like advancing with string.mid() and >>>> resorting >>>> > to >>>> > > re[1].text and re.text. >>>> > > >>>> > >>>> > Well, I think you can't use PCRE alone to solve your problem, if you >>>> want >>>> > to capture a variable number of words in your submatches. I did a bit >>>> of >>>> > reading and from what I gather [1][2] capturing group numbers are >>>> assigned >>>> > based on the verbatim regular expression, i.e. the number of >>>> submatches >>>> > you can receive is limited by the number of "(...)" constructs in your >>>> > expression; and the (otherwise very nifty) recursion operator (?R) >>>> does >>>> > not give you an unlimited number of capturing groups, sadly. >>>> > >>>> > Anyway, I think by changing your regular expression, you can let PCRE >>>> take >>>> > care of the string advancement, like so: >>>> > >>>> > 1 #!/usr/bin/gbs3 >>>> > 2 >>>> > 3 Use "gb.pcre" >>>> > 4 >>>> > 5 Public Sub Main() >>>> > 6 Dim r As New RegExp >>>> > 7 Dim s As string >>>> > 8 >>>> > 9 r.Compile("([[:alpha:]]+)[[:^alpha:]]+(.*$)", RegExp.UTF8) >>>> > 10 s = "abc12345def ghi jklm mno p1" >>>> > 11 Print "Subject:";; s >>>> > 12 Do >>>> > 13 r.Exec(s) >>>> > 14 If r.Offset = -1 Then Break >>>> > 15 Print " ->";; r[1].Text >>>> > 16 s = r[2].Text >>>> > 17 Loop While s >>>> > 18 End >>>> > >>>> > Output: >>>> > >>>> > Subject: abc12345def ghi jklm mno p1 >>>> > -> abc >>>> > -> def >>>> > -> ghi >>>> > -> jklm >>>> > -> mno >>>> > -> p >>>> > >>>> > But, I think, this is less efficient than using String.Mid(). The >>>> trailing >>>> > group (.*$) _may_ make the PCRE library read the entire subject every >>>> time. >>>> > And I believe gb.pcre will copy your submatch string when returning >>>> it. >>>> > If you care deeply about this, you'll have to trace the code in >>>> gb.pcre >>>> > and main/gbx (the interpreter) to see what copies strings and what >>>> doesn't. >>>> > >>>> > Regards, >>>> > Tobi >>>> > >>>> > [1] http://www.regular-expressions.info/recursecapture.html >>>> (Capturing >>>> > Groups Inside Recursion or Subroutine Calls) >>>> > [2] http://www.rexegg.com/regex-recursion.html (Groups Contents and >>>> > Numbering in Recursive Expressions) >>>> > >>>> > -- >>>> > "There's an old saying: Don't change anything... ever!" -- Mr. Monk >>>> > >>>> > ------------------------------------------------------------ >>>> > ------------------ >>>> > Check out the vibrant tech community on one of the world's most >>>> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot >>>> > _______________________________________________ >>>> > Gambas-user mailing list >>>> > Gambas-user@lists.sourceforge.net >>>> > https://lists.sourceforge.net/lists/listinfo/gambas-user >>>> > >>>> >>>> >>>> >>>> -- >>>> Fernando Cabral >>>> Blogue: http://fernandocabral.org >>>> Twitter: http://twitter.com/fjcabral >>>> e-mail <http://twitter.com/fjcabrale-mail>: >>>> fernandojosecab...@gmail.com >>>> Facebook: f...@fcabral.com.br >>>> Telegram: +55 (37) 99988-8868 >>>> Wickr ID: fernandocabral >>>> WhatsApp: +55 (37) 99988-8868 >>>> Skype: fernandojosecabral >>>> Telefone fixo: +55 (37) 3521-2183 >>>> Telefone celular: +55 (37) 99988-8868 >>>> >>>> Enquanto houver no mundo uma só pessoa sem casa ou sem alimentos, >>>> nenhum político ou cientista poderá se gabar de nada. >>>> ------------------------------------------------------------ >>>> ------------------ >>>> Check out the vibrant tech community on one of the world's most >>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >>>> _______________________________________________ >>>> Gambas-user mailing list >>>> Gambas-user@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/gambas-user >>>> >>> >>> >> >> >> -- >> Fernando Cabral >> Blogue: http://fernandocabral.org >> Twitter: http://twitter.com/fjcabral >> e-mail: fernandojosecab...@gmail.com >> Facebook: f...@fcabral.com.br >> Telegram: +55 (37) 99988-8868 <+55%2037%2099988-8868> >> Wickr ID: fernandocabral >> WhatsApp: +55 (37) 99988-8868 <+55%2037%2099988-8868> >> Skype: fernandojosecabral >> Telefone fixo: +55 (37) 3521-2183 <+55%2037%203521-2183> >> Telefone celular: +55 (37) 99988-8868 <+55%2037%2099988-8868> >> >> Enquanto houver no mundo uma só pessoa sem casa ou sem alimentos, >> nenhum político ou cientista poderá se gabar de nada. >> >> > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Gambas-user mailing list Gambas-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/gambas-user