> On Jan 16, 2018, at 2:18 PM, Michael Ilseman via swift-evolution
> <[email protected]> wrote:
>
> (Replying to both Eneko and George at once)
>
>>>>> I wonder if it is worth considering (for lack of a better word) *verbose*
>>>>> regular expression for Swift.
>
>>>>
>
>
> It is certainly worth thought; even if we don’t go down that path there’s
> lessons to pick up along the way. I believe “verbal expressions” is basically
> what you’re describing:
> https://github.com/VerbalExpressions/SwiftVerbalExpressions
> <https://github.com/VerbalExpressions/SwiftVerbalExpressions>
>
>
>> On Jan 16, 2018, at 11:24 AM, Eneko Alonso via swift-evolution
>> <[email protected] <mailto:[email protected]>> wrote:
>>
>> Thank you for the reply. The part I didn’t understand is if if giving names
>> to the captured groups would be mandatory. Hopefully not.
>>
>> Assuming we the user does not need names, the groups could be captures on an
>> unlabeled tuple.
>>
>
> I mention this through use of ‘_’.
>
> A construct like (let _ = \d+) could produce an unlabeled tuple element.
>
>
>
> Thinking about explicit capture names, etc., is all subject to change based
> on more investigation and playing around with examples. See my email exchange
> with John Holdsworth, where most names end up being redundant with
> destructuring at their only use site. That may have just been overly
> simplistic examples, but maybe not.
>
>
>> Digits could always be inferred to be numeric (Int) and they should always
>> be “exact” (to match "\d"):
>>
>> let usPhoneNumber: Regex = (.digits(3) + "-“).oneOrZero + .digits(3) + “-“ +
>> .digits(4)
>>
>
> What if you want to match a sequence of digits that are too large to fit in
> an Int? For example, the market cap of any stock in the S&P 500 would
> overflow Int on 32-bit platforms. Having the default represent a portion of
> the input (whether that be Substring or just a Range) is more faithful to the
> purposes of captures, which is matching parts of text. Explicitly specifying
> a type is syntax for passing the capture into an init that serves as both a
> capture-validator as well as a value constructor, which is really just yet
> another kind of Pattern. (This might be generalizable to use beyond regexes,
> but that’s a whole other digression.) This also aids discovery, as you know
> what type’s conformance to RegexSubmatchableiblewobble to check.
>
> (Note that some way to get slices or ranges will always be important for
> things like case-insensitive matching: changing case can change the number of
> graphemes in a string).
>
>
>> Personally, I like the `.optional` better than `.oneOrZero`:
>>
>> let usPhoneNumber = Regex.optional(.digits(3) + "-“) + .digits(3) + “-“ +
>> .digits(4)
>>
>> Would it be possible to support both condensed and extended syntax?
>>
>> let usPhoneNumber = / (\d{3} + "-“)? + (\d{3}) + “-“ + (\d{4}) /
>>
>> Maybe only extended (verbose) syntax would support named groups?
>>
>
> “\d” is just syntax for a built-in character class named “digit”. There will
> be some way to use a character class, whether built-in or user-defined, in a
> regex.
>
> For example, in Perl 6, you can say “\d” or “<digit>”, both of which are
> equivalent. Shortcuts for some built-in character classes are convenient and
> leverage the collective understanding of regexes amongst developers, and I
> don’t think they cause harm.
>
>> Eneko
>>
>>
>>> On Jan 16, 2018, at 10:01 AM, George Leontiev <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>
>>> @Eneko While it sure seems possible to specify the type, I think this would
>>> go against the salient point "If something’s worth capturing, it’s worth
>>> giving it a name.” Putting the name further away seems like a step backward.
>>>
>>>
>>> I could imagine a slightly more succinct syntax where things like
>>> .numberFromDigits are replaced by protocol conformance of the bound type:
>>> ```
>>> extension Int: Regexable {
>>> func baseRegex<T>() -> Regex<T, Int>
>>> }
>>> let usPhoneNumber = (/let area: Int/.exactDigits(3) + "-").oneOrZero +
>>> /let routing: Int/.exactDigits(3) + "-" +
>>> /let local: Int/.exactDigits(4)
>>> ```
>>>
>>> In this model, the `//` syntax will only be used for initial binding and
>>> swifty transformations will build the final regex.
>>>
>>>
>>>> On Jan 16, 2018, at 9:20 AM, Eneko Alonso via swift-evolution
>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>
>>>> Could it be possible to specify the regex type ahead avoiding having to
>>>> specify the type of each captured group?
>>>>
>>>> let usPhoneNumber: Regex<UnicodeScalar, (area: Int?, routing: Int, local:
>>>> Int)> = /
>>>> (\d{3}?) -
>>>> (\d{3}) -
>>>> (\d{4}) /
>>>>
>>>> “Verbose” alternative:
>>>>
>>>> let usPhoneNumber: Regex<UnicodeScalar, (area: Int?, routing: Int, local:
>>>> Int)> = /
>>>> .optional(.numberFromDigits(.exactly(3)) + "-“) +
>>>> .numberFromDigits(.exactly(3)) + "-"
>>>> .numberFromDigits(.exactly(4)) /
>>>> print(type(of: usPhoneNumber)) // => Regex<UnicodeScalar, (area: Int?,
>>>> routing: Int, local: Int)>
>>>>
>>>>
>>>> Thanks,
>>>> Eneko
>>>>
>>>>
>>>>> On Jan 16, 2018, at 8:52 AM, George Leontiev via swift-evolution
>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>
>>>>> Thanks, Michael. This is very interesting!
>>>>>
>>>>> I wonder if it is worth considering (for lack of a better word) *verbose*
>>>>> regular expression for Swift.
>>>>>
>>>>> For instance, your example:
>>>>> ```
>>>>> let usPhoneNumber = /
>>>>> (let area: Int? <- \d{3}?) -
>>>>> (let routing: Int <- \d{3}) -
>>>>> (let local: Int <- \d{4}) /
>>>>> ```
>>>>> would become something like (strawman syntax):
>>>>> ```
>>>>> let usPhoneNumber = /let area: Int? <- .numberFromDigits(.exactly(3))/ +
>>>>> "-" +
>>>>> /let routing: Int <- .numberFromDigits(.exactly(3))/
>>>>> + "-"
>>>>> /let local: Int <- .numberFromDigits(.exactly(4))/
>>>>> ```
>>>>> With this format, I also noticed that your code wouldn't match
>>>>> "555-5555", only "-555-5555", so maybe it would end up being something
>>>>> like:
>>>>> ```
>>>>> let usPhoneNumber = .optional(/let area: Int <-
>>>>> .numberFromDigits(.exactly(3))/ + "-") +
>>>>> /let routing: Int <- .numberFromDigits(.exactly(3))/
>>>>> + "-"
>>>>> /let local: Int <- .numberFromDigits(.exactly(4))/
>>>>> ```
>>>>> Notice that `area` is initially a non-optional `Int`, but becomes
>>>>> optional when transformed by the `optional` directive.
>
> That is a good catch and illustrates some of the trappings of regexes and the
> need for pick the right syntax. BTW, when you say optional, does it mean the
> match didn’t happen or the capture-validation didn’t succeed? In this
> example, it seems like the inclusive-or of both.
Yes, it would be inclusive-or. This is a good example of your above point how
capture-validation and matching can be conflated. I can’t immediately thing of
a good way to make this explicit, but being able to do /let area: Int/ to match
“something that can decode to Int” feels very convenient.
>
>>>>> Other directives may be:
>>>>> ```
>>>>> let decimal = /let beforeDecimalPoint: Int <--
>>>>> .numberFromDigits(.oneOrMore)/ +
>>>>> .optional("." + /let afterDecimalPoint: Int <--
>>>>> .numberFromDigits(.oneOrMore)/
>>>>> ```
>>>>>
>>>>> In this world, the `/<--/` format will only be used for explicit binding,
>>>>> and the rest will be inferred from generic `+` operators.
>>>>>
>>>>>
>>>>> I also think it would be helpful if `Regex` was generic over all sequence
>>>>> types.
>>>>> Going back to the phone example, this would looks something like:
>>>>> ```
>>>>> let usPhoneNumber = .optional(/let area: Int <-
>>>>> .numberFromDigits(.exactly(3))/ + "-") +
>>>>> /let routing: Int <- .numberFromDigits(.exactly(3))/
>>>>> + "-"
>>>>> /let local: Int <- .numberFromDigits(.exactly(4))/
>>>>> print(type(of: usPhoneNumber)) // => Regex<UnicodeScalar, (area: Int?,
>>>>> routing: Int, local: Int)>
>>>>> ```
>>>>> Note the addition of `UnicodeScalar` to the signature of `Regex`. Other
>>>>> interesting signatures are `Regex<JSONToken, JSONEnumeration>` or
>>>>> `Regex<HTTPRequestHeaderToken, HTTPRequestHeader>`. Building parsers
>>>>> becomes fun!
>>>>>
>
> I think I missed something. What does the `UnicodeScalar` type parameter do?
I was just commenting here that we may want to regex over non-strings.
Regex<UnicodeScalar, T> would operate over strings (sequences of
UnicodeScalar), but being able to create Regexes for arbitrary sequences
(non-strings) may be useful as well.
>
>>>>> - George
>>>>>
>>>>>> On Jan 10, 2018, at 11:58 AM, Michael Ilseman via swift-evolution
>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>>
>>>>>> Hello, I just sent an email to swift-dev titled "State of String: ABI,
>>>>>> Performance, Ergonomics, and You!” at
>>>>>> https://lists.swift.org/pipermail/swift-dev/Week-of-Mon-20180108/006407.html
>>>>>>
>>>>>> <https://lists.swift.org/pipermail/swift-dev/Week-of-Mon-20180108/006407.html>,
>>>>>> whose gist can be found at
>>>>>> https://gist.github.com/milseman/bb39ef7f170641ae52c13600a512782f
>>>>>> <https://gist.github.com/milseman/bb39ef7f170641ae52c13600a512782f>. I
>>>>>> posted to swift-dev as much of the content is from an implementation
>>>>>> perspective, but it also addresses many areas of potential evolution.
>>>>>> Please refer to that email for details; here’s the recap from it:
>>>>>>
>>>>>> ### Recap: Potential Additions for Swift 5
>>>>>>
>>>>>> * Some form of unmanaged or unsafe Strings, and corresponding APIs
>>>>>> * Exposing performance flags, and some way to request a scan to populate
>>>>>> them
>>>>>> * API gaps
>>>>>> * Character and UnicodeScalar properties, such as isNewline
>>>>>> * Generalizing, and optimizing, String interpolation
>>>>>> * Regex literals, Regex type, and generalized pattern match destructuring
>>>>>> * Substitution APIs, in conjunction with Regexes.
>>>>>>
>>>>>> _______________________________________________
>>>>>> swift-evolution mailing list
>>>>>> [email protected] <mailto:[email protected]>
>>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>>>> <https://lists.swift.org/mailman/listinfo/swift-evolution>
>>>>>
>>>>> _______________________________________________
>>>>> swift-evolution mailing list
>>>>> [email protected] <mailto:[email protected]>
>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>>> <https://lists.swift.org/mailman/listinfo/swift-evolution>
>>>>
>>>> _______________________________________________
>>>> swift-evolution mailing list
>>>> [email protected] <mailto:[email protected]>
>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>> <https://lists.swift.org/mailman/listinfo/swift-evolution>
>>>
>>
>> _______________________________________________
>> swift-evolution mailing list
>> [email protected] <mailto:[email protected]>
>> https://lists.swift.org/mailman/listinfo/swift-evolution
>> <https://lists.swift.org/mailman/listinfo/swift-evolution>
>
> _______________________________________________
> swift-evolution mailing list
> [email protected] <mailto:[email protected]>
> https://lists.swift.org/mailman/listinfo/swift-evolution
> <https://lists.swift.org/mailman/listinfo/swift-evolution>
_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution