Why not make all keywords soft in python?

image

In Python we have the concept of a soft keyword, which makes some keywords reserved only in some special cases (e.g. match, case and type). From what I have interpreted, usually language designers try to minimize the number of reserved words of a language. So when we can have a keyword as a soft keyword, why not use it for all keywords? What's the problem with using soft keywords for all reserved words?

I want to rephrase my question to make it more concise:

Why don't language designers try to maximize use of soft keywords? There are several hard keywords in Python that can be soft keyword (e.g def, class, import, etc). If language designers want their language has fewest possible reserved words (if this assumption is wrong please correct me) then why don't they turn these hard keywords into soft?

Soft keywords in Python

If language designers want their language has fewest possible reserved words (if this assumption is wrong please correct me) then why don't they turn these hard keywords into soft?

That assumption is wrong. I've been on the design committees for Visual Basic, VBScript, JavaScript, C# and others, and in zero of those design committees did we ever even consider "minimize the number of reserved words" as a design goal. Heck, Netscape reserved all the keywords of Java in JavaScript -- remember, JavaScript was called that for marketing reasons, not because it has anything whatsoever to do with Java -- just because someday they might want any given Java feature in JavaScript.

Why don't language designers try to maximize use of soft keywords?

Because that would work directly against the primary goal of designers of modern general-purpose line-of-business languages! "Maximize soft keywords" is a special case of the more general principle "make as many strings as possible legal programs to maximize the user's expressive choices", and that's not a design principle that modern LOB languages care about very much.

Rather, as general-purpose LOB language designers, we primarily seek to restrict the users' choices to encourage them to write understandable programs that can be easily seen to be correct. We famously described C# as a language that "throws you into the Pit of Success" -- the design of the language should force you to work harder to write a bad, incomprehensible program than to write a good, correct, comprehensible program. Too many soft keywords works strongly against that goal, and buys you no expressive power in return!

Allowing class and for and while and if to be legal identifiers maximizes developer choice and expressiveness at the cost of not just complicating the grammar, but at the cost of making it easier to write incomprehensible programs! This is legal VBScript, for example:

That's not great.

Now, as language designers we do care that users be able to express their business logic using the jargon of that business domain, and that occasionally means that a user will run into a conflict with an existing keyword. "Make all keywords soft" is a bad solution to this problem. A better solution is "make a syntax for escaping any keyword". In C# for example, any keyword preceded by @ is an identifier, so if you do feel compelled to write:

you can do so. In Visual Basic, almost any text can be enclosed in [] and it becomes an identifier:

Is gross, but legal, if you really want to do it.

Summing up: maximizing soft keywords is not a goal in itself, works against legibility, and is unnecessary to achieve lesser goals such as expressivity. That's why we don't do it.

The big disadvantage of soft keywords is that they make things ambiguous. That means we have to come up with rules to resolve the ambiguities. This makes the grammar more complex. This is a problem both for programs and humans:

Who wants to read code like if = then + else;? Who wants to remember how to figure out what it means?

Minimizing the number of reserved words is not necessarily a goal in all languages. If the goal is "any word should be usable as an identifier", there are other design choices that could accomplish it, without making things ambiguous.

For example, keywords can be used as identifiers by using @ in C# (int @if = 3;), or [] in SQL: [if].

In the case of the [] syntax, it can also do more, like allowing spaces in identifiers.

As Eric mentioned, the case where soft keywords shine is when adding keywords to an existing language. In that case, the ambiguity is still a problem, but it serves the purpose to avoid breaking backwards compatibility for any existing code that is already using that word as an identifier.

Eldritch Continuum's answer explains the general issue ─ that contextual or "soft" keywords can create parsing ambiguities if you aren't careful. And Eric Lippert's comment explains that it is not generally desirable to have soft keywords, but rather they exist only when new keywords must be added to the language without creating syntax errors in existing code. (Indeed, when match was added as a language feature, it was already in use as an identifier even in the standard library.)

Still, I thought it would be worth adding some concrete examples to show that the grammar would be ambiguous if certain "hard" keywords were instead "soft".

Most examples come from the fact that a name followed by parentheses is parsed as a function call expression. But the keywords return, yield, await, raise, assert, del and not already have other meanings when followed by expressions, which may be parenthesised. So for example return (1 + 1) could be parsed either as a return statement, or as a function call.

The same is true with square brackets, which are used for subscripts. For example return [0] could either return a new list, or access the first element from a list named return.

Similarly, the keyword from could become ambiguous in expressions like yield from (foo), which would be either a yield from expression or a yield expression which yields the result of calling from(foo).

The keywords break, continue and pass (as well as raise, return and yield) are each complete statements by themselves. But Python's grammar also allows any expression to occur where a statement is expected, so these could instead mean a statement consisting of a simple name expression, which does nothing (or raises a NameError if the name doesn't exist).

The literals None, True and False are also keywords, so if they were soft keywords then they would be ambiguously parsed as either literals or simple names. Perhaps you could call this a kind of shadowing, but at least grammatically these are supposed to be separate categories.

The control-flow keywords else, try, except and finally may be immediately followed by a : with another statement on the same line. But the same : can also follow a simple name to provide a type annotation in an assignment expression. In this example, imagine foo, baz and quux are types:

There are other keywords like if, for, while, and, or or so on, which don't appear like they would create ambiguities in the grammar if they were made into soft keywords. But doing so would transform a lot of common syntax errors into code which is still invalid, but the syntax error is misleading or confusing. For instance, the following code has a missing : after the if condition, but this would be parsed as a function call, and the error message would tell the user that the next line is not indented correctly (despite that it is, according to the user's intent):

Or, a syntax mistake might result in code which is syntactically valid by accident and means the wrong thing. For instance, this code would raise a NameError at runtime, since there is no function named global:

It's true that if you really wanted to, you could make all of these keywords "soft" without making the grammar ambiguous, if you identify every possible case where the grammar would be ambiguous, and add rules to the grammar to disambiguate them. For instance, you could define that a function call is like expr ( ... ) where expr must not be a simple name expression whose identifier is a soft keyword, and that would disambigate many of the examples above.

The problem is that identifying every possible case of ambiguity is hard. No doubt, there are more cases than I could think of when writing this answer. And the resulting grammar will be more complex, making it harder for users, too; programmers will need to know the disambiguation rules in order to understand what certain code means. Python's simple syntax is one of its biggest selling points, making it a relatively easy language for beginners, so there would need to be a strong reason to give that up.

The language specification motivates soft keywords purely as a mechanism for backwards compatibility:

[…] As soft keywords, their use in the grammar is possible while still preserving compatibility with existing code that uses these names as identifier names.

The path for keywords is clearly only from soft to hard, not the other way around. For example, await was introduced in Python 3.5 but only became a proper keyword in Python 3.7.

(The async/await machinery predates the hard/soft keyword lingo introduced with the PEG parser in CPython 3.9. It is functionally equivalent, though.)

Overall, keep in mind that Python is an opinionated language. Just because it might be technically possible to make well-established keywords lose their meaning in tiny niches doesn’t mean Python considers that worth the effort, both for the language and the people using it.

Making keywords such as case soft has some limited value beyond backwards compatibility because they are contextual: they are only meaningful inside larger syntax constructs with clearly confined scope. Compare this to how keywords such as in already do double duty (as an operator and part of for loops) because they are not top-level constructs either. This is not the case for keywords such as def which are meaningful everywhere a statement is allowed.

Ask AI
#1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 #48 #49 #50 #51 #52 #53 #54 #55 #56 #57 #58 #59 #60 #61 #62 #63 #64 #65 #66 #67 #68 #69 #70