Claudia Sassen – StudiGer
Constituent structure is based on the observation that words combine with other words to form units. The evidence that a sequence of words forms such a unit is given by substitutability — that is, a sequence of words in a well-formed sentence can be replaced by a shorter sequence without rendering the sentence ill-formed. To clarify this idea, consider the following sentence:. The fact that we can substitute He for The little bear indicates that the latter sequence is a unit. By contrast, we cannot replace little bear saw in the same way.
He saw the fine fat trout in the brook.
Each sequence that forms a unit can in fact be replaced by a single word, and we end up with just two elements. Figure 2. If we now strip out the words apart from the topmost row, add an S node, and flip the figure over, we end up with a standard phrase structure tree, shown in 8. Each node in this tree including the words is called a constituent. As we will see in the next section, a grammar specifies how the sentence can be subdivided into its immediate constituents, and how these can be further subdivided until we reach the level of individual words.
As we saw in 1 , sentences can have arbitrary length. Consequently, phrase structure trees can have arbitrary depth. The cascaded chunk parsers we saw in 4 can only produce structures of bounded depth, so chunking methods aren't applicable here. Let's start off by looking at a simple context-free grammar. By convention, the left-hand-side of the first production is the start-symbol of the grammar, typically S , and all well-formed trees must have this symbol as their root label. In NLTK, context-free grammars are defined in the nltk.
Example 3. The grammar in 3. Figure 3. Your Turn: Try developing a simple grammar of your own, using the recursive descent parser application, nltk. It comes already loaded with a sample grammar, but you can edit this as you please using the Edit menu. Change the grammar, and the sentence to be parsed, and run the parser using the autostep button.
If we parse the sentence The dog saw a man in the park using the grammar shown in 3. Since our grammar licenses two trees for this sentence, the sentence is said to be structurally ambiguous. The ambiguity in question is called a prepositional phrase attachment ambiguity , as we saw earlier in this chapter. As you may recall, it is an ambiguity about attachment since the PP in the park needs to be attached to one of two places in the tree: either as a child of VP or else as a child of NP.
When the PP is attached to VP , the intended interpretation is that the seeing event happened in the park. However, if the PP is attached to NP , then it was the man who was in the park, and the agent of the seeing the dog might have been sitting on the balcony of an apartment overlooking the park. If you are interested in experimenting with writing CFGs, you will find it helpful to create and edit your grammar in a text file, say mygrammar. You can then load it into NLTK and parse with it as follows:. Make sure that you put a.
- An Overview of Linguistic Structures in Torwali, a Language of Northern Pakistan?
- By Keith Brown and Jim Miller.
- How to Do a Discourse Analysis?
- A Concise History of Modern India (Cambridge Concise Histories);
- Nankering with the Rolling Stones?
- Italian Hours [with Biographical Introduction]?
If the command print tree produces no output, this is probably because your sentence sent is not admitted by your grammar. You can also check what productions are currently in the grammar with the command for p in grammar1. When you write CFGs for parsing in NLTK, you cannot combine grammatical categories with lexical items on the righthand side of the same production. In addition, you are not permitted to place multi-word lexical items on the righthand side of a production.
A grammar is said to be recursive if a category occurring on the left hand side of a production also appears on the righthand side of a production, as illustrated in 3. To see how recursion arises from this grammar, consider the following trees. We've only illustrated two levels of recursion here, but there's no upper limit on the depth.
You can experiment with parsing sentences that involve more deeply nested structures. A parser processes input sentences according to the productions of a grammar, and builds one or more constituent structures that conform to the grammar. A grammar is a declarative specification of well-formedness — it is actually just a string, not a program. A parser is a procedural interpretation of the grammar.
It searches through the space of trees licensed by a grammar to find one that has the required sentence along its fringe. A parser permits a grammar to be evaluated against a collection of test sentences, helping linguists to discover mistakes in their grammatical analysis. A parser can serve as a model of psycholinguistic processing, helping to explain the difficulties that humans have with processing certain syntactic constructions.
Many natural language applications involve parsing at some point; for example, we would expect the natural language questions submitted to a question-answering system to undergo parsing as an initial step. In this section we see two simple parsing algorithms, a top-down method called recursive descent parsing, and a bottom-up method called shift-reduce parsing. We also see some more sophisticated algorithms, a top-down method with bottom-up filtering called left-corner parsing, and a dynamic programming technique called chart parsing.
The simplest kind of parser interprets a grammar as a specification of how to break a high-level goal into several lower-level subgoals. The top-level goal is to find an S. Each of these subgoals can be replaced in turn by sub-sub-goals, using productions that have NP and VP on their left-hand side. Eventually, this expansion process leads to subgoals such as: find the word telescope. Such subgoals can be directly compared against the input sequence, and succeed if the next word is matched.
If there is no match the parser must back up and try a different alternative. The recursive descent parser builds a parse tree during the above process. With the initial goal find an S , the S root node is created. As the above process recursively expands its goals using the productions of the grammar, the parse tree is extended downwards hence the name recursive descent. We can see this in action using the graphical demonstration nltk. Six stages of the execution of this parser are shown in 4.
Figure 4. During this process, the parser is often forced to choose between several possible productions. For example, in going from step 3 to step 4, it tries to find productions with N on the left-hand side. Much later, as shown in step 5, it finds a complete parse. This is a tree that covers the entire sentence, without any dangling edges. Once a parse has been found, we can get the parser to look for additional parses.
Again it will backtrack and explore other choices of production in case any of them result in a parse. RecursiveDescentParser takes an optional parameter trace. If trace is greater than zero, then the parser will report the steps that it takes as it parses a text.
- Keyword Search.
- Molecular cloning : a laboratory manual.
- The Path to Engagement: Enjoy more growth, happiness and success.
- The Family Therapy Treatment Planner (PracticePlanners?)!
- Cell Polarity: 26 (Advances in Molecular and Cell Biology).
Recursive descent parsing has three key shortcomings. Second, the parser wastes a lot of time considering words and structures that do not correspond to the input sentence.
Third, the backtracking process may discard parsed constituents that will need to be rebuilt again later. Recursive descent parsing is a kind of top-down parsing. Top-down parsers use a grammar to predict what the input will be, before inspecting the input!
Controlled Natural Language
However, since the input is available to the parser all along, it would be more sensible to consider the input sentence from the very beginning. This approach is called bottom-up parsing , and we will see an example in the next section. A simple kind of bottom-up parser is the shift-reduce parser.
In common with all bottom-up parsers, a shift-reduce parser tries to find sequences of words and phrases that correspond to the right hand side of a grammar production, and replace them with the left-hand side, until the whole sentence is reduced to an S. The shift-reduce parser repeatedly pushes the next input word onto a stack 4. If the top n items on the stack match the n items on the right hand side of some production, then they are all popped off the stack, and the item on the left-hand side of the production is pushed on the stack.
This replacement of the top n items with a single item is the reduce operation.
Passionate about Self-Care and Group Wellness? Sign up for the Landwithin newsletter!
This operation may only be applied to the top of the stack; reducing items lower in the stack must be done before later items are pushed onto the stack. The parser finishes when all the input is consumed and there is only one item remaining on the stack, a parse tree with an S node as its root. The shift-reduce parser builds a parse tree during the above process. Each time it pops n items off the stack it combines them into a partial parse tree, and pushes this back on the stack.
Related Linguistic Dimensions of Crisis Talk: Formalising Structures in a Controlled Language
Copyright 2019 - All Right Reserved