INET Framework for OMNeT++/OMNEST
inet::PatternMatcher Class Reference

Glob-style pattern matching class, adopted to special OMNeT++ requirements. More...

#include <PatternMatcher.h>

Classes

struct  Elem
 

Public Member Functions

 PatternMatcher ()
 Constructor. More...
 
 PatternMatcher (const char *pattern, bool dottedpath, bool fullstring, bool casesensitive)
 Constructor. More...
 
 ~PatternMatcher ()
 Destructor. More...
 
void setPattern (const char *pattern, bool dottedpath, bool fullstring, bool casesensitive)
 Sets the pattern to be used by subsequent calls to matches(). More...
 
bool matches (const char *line) const
 Returns true if the line matches the pattern with the given settings. More...
 
const char * patternPrefixMatches (const char *line, int suffixoffset)
 Similar to matches(): it returns non-nullptr iif (1) the pattern ends in a string literal (and not, say, '*' or '**') which contains the line suffix (which begins at suffixoffset characters of line) and (2) pattern matches the whole line, except that (3) in matching the pattern's last string literal, it is also accepted if line is shorter than the pattern. More...
 
std::string debugStr ()
 Returns the internal representation of the pattern as a string. More...
 
void dump ()
 Prints the internal representation of the pattern on the standard output. More...
 

Static Public Member Functions

static bool containsWildcards (const char *pattern)
 Utility function to determine whether a given string contains wildcards. More...
 

Private Types

enum  ElemType {
  LITERALSTRING = 0, ANYCHAR, COMMONCHAR, SET,
  NEGSET, NUMRANGE, ANYSEQ, COMMONSEQ,
  END
}
 

Private Member Functions

void parseSet (const char *&s, Elem &e)
 
void parseNumRange (const char *&s, Elem &e)
 
void parseLiteralString (const char *&s, Elem &e)
 
bool parseNumRange (const char *&str, char closingchar, long &lo, long &up)
 
std::string debugStrFrom (int from)
 
bool isInSet (char c, const char *set) const
 
bool doMatch (const char *line, int patternpos, int suffixlen) const
 

Private Attributes

std::vector< Elempattern
 
bool iscasesensitive = false
 
std::string rest
 

Detailed Description

Glob-style pattern matching class, adopted to special OMNeT++ requirements.

One instance represents a pattern to match.

Pattern syntax:

  • ? : matches any character except '.'
  • * : matches zero or more characters except '.'
  • ** : matches zero or more character (any character)
  • {a-z} : matches a character in range a-z
  • {^a-z} : matches a character NOT in range a-z
  • {32..255} : any number (ie. sequence of digits) in range 32..255 (e.g. "99")
  • [32..255] : any number in square brackets in range 32..255 (e.g. "[99]")
  • backslash \ : takes away the special meaning of the subsequent character

The "except '.'" phrases in the above rules apply only in "dottedpath" mode (see below).

There are three option switches (see setPattern() method):

  • dottedpath: dottedpath=yes is the mode used in omnetpp.ini for matching module parameters, like this: "**.mac[*].retries=9". In this mode mode, '*' cannot "eat" dot, so it can only match one component (module name) in the path. '**' can be used to match more components. (This is similar to e.g. Java Ant's usage of the asterisk.) In dottedpath=false mode, '*' will match anything.
  • fullstring: selects between full string and substring match. The pattern "ate" will match "whatever" in substring mode, but not in full string mode.
  • case sensitive: selects between case sensitive and case insensitive mode.

Rule details:

  • sets, negated sets: They can contain several character ranges and also enumeration of characters. For example: "{_a-zA-Z0-9}","{xyzc-f}". To include '-' in the set, put it at a position where it cannot be interpreted as character range, for example: "{a-z-}" or "{-a-z}". If you want to include '}' in the set, it must be the first character: "{}a-z}", or as a negated set: "{^}a-z}". A backslash is always taken as literal backslash (and NOT as escape character) within set definitions. When doing case-insensitive match, avoid ranges that include both alpha (a-zA-Z) and non-alpha characters, because they might cause funny results.
  • numeric ranges: only nonnegative integers can be matched. The start or the end of the range (or both) can be omitted: "{10..}", "{..99}" or "{..}" are valid numeric ranges (the last one matches any number). The specification must use exactly two dots. Caveat: "*{17..19}" will match "a17","117" and "963217" as well.

Member Enumeration Documentation

◆ ElemType

Enumerator
LITERALSTRING 
ANYCHAR 
COMMONCHAR 
SET 
NEGSET 
NUMRANGE 
ANYSEQ 
COMMONSEQ 
END 
72  {
73  LITERALSTRING = 0,
74  ANYCHAR,
75  COMMONCHAR, // any char except "."
76  SET,
77  NEGSET,
78  NUMRANGE,
79  ANYSEQ, // "**": sequence of any chars
80  COMMONSEQ, // "*": seq of any chars except "."
81  END
82  };

Constructor & Destructor Documentation

◆ PatternMatcher() [1/2]

inet::PatternMatcher::PatternMatcher ( )

Constructor.

20 {
21 }

◆ PatternMatcher() [2/2]

inet::PatternMatcher::PatternMatcher ( const char *  pattern,
bool  dottedpath,
bool  fullstring,
bool  casesensitive 
)

Constructor.

24 {
25  setPattern(pattern, dottedpath, fullstring, casesensitive);
26 }

◆ ~PatternMatcher()

inet::PatternMatcher::~PatternMatcher ( )

Destructor.

29 {
30 }

Member Function Documentation

◆ containsWildcards()

bool inet::PatternMatcher::containsWildcards ( const char *  pattern)
static

Utility function to determine whether a given string contains wildcards.

If it does not, a simple strcmp() might be a faster option than using PatternMatcher.

391 {
392  return strchr(pattern, '?') || strchr(pattern, '*') ||
393  strchr(pattern, '\\') || strchr(pattern, '{') ||
394  strstr(pattern, "..");
395 }

◆ debugStr()

std::string inet::PatternMatcher::debugStr ( )
inline

Returns the internal representation of the pattern as a string.

May be useful for debugging purposes.

161 { return debugStrFrom(0); }

◆ debugStrFrom()

std::string inet::PatternMatcher::debugStrFrom ( int  from)
private
177 {
178  std::string result;
179  for (size_t k = from; k < pattern.size(); k++) {
180  Elem& e = pattern[k];
181  switch (e.type) {
182  case LITERALSTRING:
183  result = result + "\"" + e.literalstring + "\"";
184  break;
185 
186  case ANYCHAR:
187  result += "?!";
188  break;
189 
190  case COMMONCHAR:
191  result += "?";
192  break;
193 
194  case SET:
195  result = result + "SET(" + e.setchars + ")";
196  break;
197 
198  case NEGSET:
199  result = result + "NEGSET(" + e.setchars + ")";
200  break;
201 
202  case NUMRANGE: {
203  char buf[100];
204  sprintf(buf, "%ld..%ld", e.fromnum, e.tonum);
205  result += buf;
206  } break;
207 
208  case ANYSEQ:
209  result += "**";
210  break;
211 
212  case COMMONSEQ:
213  result += "*";
214  break;
215 
216  case END:
217  break;
218 
219  default:
220  ASSERT(0);
221  break;
222  }
223  result += " ";
224  }
225  return result;
226 }

◆ doMatch()

bool inet::PatternMatcher::doMatch ( const char *  line,
int  patternpos,
int  suffixlen 
) const
private
242 {
243  while (true) {
244  const Elem& e = pattern[k];
245  long num; // case NUMRANGE
246  int len; // case LITERALSTRING
247  switch (e.type) {
248  case LITERALSTRING:
249  len = e.literalstring.length();
250  // special case: last string literal with prefix match: allow s to be shorter
251  if (suffixlen > 0 && k == (int)pattern.size() - 2)
252  len -= suffixlen;
253  // compare
254  if (iscasesensitive ? strncmp(s, e.literalstring.c_str(), len) : strncasecmp(s, e.literalstring.c_str(), len))
255  return false;
256  s += len;
257  break;
258 
259  case ANYCHAR:
260  if (!*s)
261  return false;
262  s++;
263  break;
264 
265  case COMMONCHAR:
266  if (!*s || *s == '.')
267  return false;
268  s++;
269  break;
270 
271  case SET:
272  if (!*s)
273  return false;
274  if (!isInSet(*s, e.setchars.c_str()))
275  return false;
276  s++;
277  break;
278 
279  case NEGSET:
280  if (!*s)
281  return false;
282  if (isInSet(*s, e.setchars.c_str()))
283  return false;
284  s++;
285  break;
286 
287  case NUMRANGE:
288  if (!opp_isdigit(*s))
289  return false;
290  num = atol(s);
291  while (opp_isdigit(*s))
292  s++;
293  if ((e.fromnum >= 0 && num < e.fromnum) || (e.tonum >= 0 && num > e.tonum))
294  return false;
295  break;
296 
297  case ANYSEQ:
298  // potential shortcuts: if pattern ends in ANYSEQ, rest of the input
299  // can be anything; if pattern ends in ANYSEQ LITERAL, it's enough if
300  // input ends in the literal string
301  if (k == (int)pattern.size() - 2)
302  return true;
303  if (k == (int)pattern.size() - 3 && pattern[k + 1].type == LITERALSTRING)
304  return opp_stringendswith(s, pattern[k + 1].literalstring.c_str());
305 
306  // general case
307  while (true) {
308  if (doMatch(s, k + 1, suffixlen))
309  return true;
310  if (!*s)
311  return false;
312  s++;
313  }
314  break; // at EOS
315 
316  case COMMONSEQ:
317  while (true) {
318  if (doMatch(s, k + 1, suffixlen))
319  return true;
320  if (!*s || *s == '.')
321  return false;
322  s++;
323  }
324  break;
325 
326  case END:
327  return !*s;
328 
329  default:
330  ASSERT(0);
331  break;
332  }
333  k++;
334  ASSERT(k < (int)pattern.size());
335  }
336 }

Referenced by matches(), and patternPrefixMatches().

◆ dump()

void inet::PatternMatcher::dump ( )
inline

Prints the internal representation of the pattern on the standard output.

May be useful for debugging purposes.

167 { printf("%s", debugStr().c_str()); }

◆ isInSet()

bool inet::PatternMatcher::isInSet ( char  c,
const char *  set 
) const
private
229 {
230  ASSERT((strlen(set) & 1) == 0);
231  if (!iscasesensitive)
232  c = opp_toupper(c); // set is already uppercase here
233  while (*set) {
234  if (c >= *set && c <= *(set + 1))
235  return true;
236  set += 2;
237  }
238  return false;
239 }

Referenced by doMatch().

◆ matches()

bool inet::PatternMatcher::matches ( const char *  line) const

Returns true if the line matches the pattern with the given settings.

See setPattern().

339 {
340  ASSERT(pattern[pattern.size() - 1].type == END);
341 
342  // shortcut: omnetpp.ini keys often begin with "*" or "**"
343  // but end in a string literal. So it's usually a performance win to
344  // to first check that the last string literal of the pattern matches
345  // the end of the string. (We do the shortcut only in the case-sensitive
346  // case. omnetpp.ini is case sensitive.)
347 
348  if (pattern.size() >= 2 && iscasesensitive) {
349  const Elem& e = pattern[pattern.size() - 2];
350  if (e.type == LITERALSTRING) {
351  // return if last 2 chars don't match
352  int pattlen = e.literalstring.size();
353  int linelen = strlen(line);
354  if (pattlen >= 2 && linelen >= 2 && (line[linelen - 1] != e.literalstring.at(pattlen - 1) ||
355  line[linelen - 2] != e.literalstring.at(pattlen - 2))) // FIXME why doesn't work for pattlen==1 ?
356  return false;
357  }
358  }
359 
360  // perform full-blown pattern matching
361  return doMatch(line, 0, 0);
362 }

Referenced by inet::GateScheduleConfiguratorBase::addFlows(), inet::ospfv2::Ospfv2ConfigReader::findMatchingConfig(), inet::ospfv2::Ospfv2ConfigReader::getInterfaceByXMLAttributesOf(), inet::ospfv2::Ospfv2ConfigReader::loadConfigFromXML(), and inet::Ipv4RoutingTable::updateNetmaskRoutes().

◆ parseLiteralString()

void inet::PatternMatcher::parseLiteralString ( const char *&  s,
Elem e 
)
private
132 {
133  e.type = LITERALSTRING;
134  while (*s && *s != '?' && *s != '{' && *s != '*') {
135  long dummy;
136  const char *s1;
137  if (*s == '\\')
138  e.literalstring += *(++s);
139  else
140  e.literalstring += *s;
141  if (*s == '[' && parseNumRange((s1 = s), ']', dummy, dummy))
142  break;
143  s++;
144  }
145 }

Referenced by setPattern().

◆ parseNumRange() [1/2]

void inet::PatternMatcher::parseNumRange ( const char *&  s,
Elem e 
)
private

Referenced by parseLiteralString(), and setPattern().

◆ parseNumRange() [2/2]

bool inet::PatternMatcher::parseNumRange ( const char *&  str,
char  closingchar,
long &  lo,
long &  up 
)
private
148 {
149  //
150  // try to parse "[n..m]" or "{n..m}" and return true on success.
151  // str should point at "[" or "{"; on success return it'll point to "]" or "}",
152  // and on failure it'll be unchanged. n and m will be stored in lo and up.
153  // They are optional -- if missing, lo or up will be set to -1.
154  //
155  lo = up = -1L;
156  const char *s = str + 1; // skip "[" or "{"
157  if (opp_isdigit(*s)) {
158  lo = atol(s);
159  while (opp_isdigit(*s))
160  s++;
161  }
162  if (*s != '.' || *(s + 1) != '.')
163  return false;
164  s += 2;
165  if (opp_isdigit(*s)) {
166  up = atol(s);
167  while (opp_isdigit(*s))
168  s++;
169  }
170  if (*s != closingchar)
171  return false;
172  str = s;
173  return true;
174 }

◆ parseSet()

void inet::PatternMatcher::parseSet ( const char *&  s,
Elem e 
)
private
96 {
97  s++; // skip "{"
98  e.type = SET;
99  if (*s == '^') {
100  e.type = NEGSET;
101  s++;
102  }
103  // Note: to make "}" part of the set, it must be first within the braces
104  const char *sbeg = s;
105  while (*s && (*s != '}' || s == sbeg)) {
106  char range[3];
107  range[2] = 0;
108  if (*(s + 1) == '-' && *(s + 2) && *(s + 2) != '}') {
109  // store "A-Z" as "AZ"
110  range[0] = *s;
111  range[1] = *(s + 2);
112  s += 3;
113  }
114  else {
115  // store "X" as "XX"
116  range[0] = range[1] = *s;
117  s++;
118  }
119  if (!iscasesensitive) {
120  // if one end of range is alpha and the other is not, funny things will happen
121  range[0] = opp_toupper(range[0]);
122  range[1] = opp_toupper(range[1]);
123  }
124  e.setchars += range;
125  }
126  if (!*s)
127  throw cRuntimeError("unmatched '}' in expression");
128  s++; // skip "}"
129 }

Referenced by setPattern().

◆ patternPrefixMatches()

const char * inet::PatternMatcher::patternPrefixMatches ( const char *  line,
int  suffixoffset 
)

Similar to matches(): it returns non-nullptr iif (1) the pattern ends in a string literal (and not, say, '*' or '**') which contains the line suffix (which begins at suffixoffset characters of line) and (2) pattern matches the whole line, except that (3) in matching the pattern's last string literal, it is also accepted if line is shorter than the pattern.

If the above conditions hold, it returns the rest of the pattern. The returned pointer is valid until the next call to this method.

This method is used by cIniFile's getEntriesWithPrefix(), used e.g. to find RNG mapping entries for a module. For that, we have to find all ini file entries (keys) like "net.host1.gen.rng-NN" where NN=0,1,2,... In cIniFile, every entry is a pattern ("**.host*.gen.rng-1", "**.*.gen.rng-0", etc.). So we'd invoke patternPrefixMatches("net.host1.gen.rng-", 13) (i.e. suffix=".rng-") to find those entries (patterns) which can expand to "net.host1.gen.rng-0", "net.host1.gen.rng-1", etc.

See matches().

365 {
366  if (!iscasesensitive)
367  throw cRuntimeError("PatternMatcher: patternPrefixMatches() doesn't support case-insensitive match");
368 
369  // pattern must end in a literal string...
370  ASSERT(pattern[pattern.size() - 1].type == END);
371  if (pattern.size() < 2)
372  return nullptr;
373  Elem& e = pattern[pattern.size() - 2];
374  if (e.type != LITERALSTRING)
375  return nullptr;
376 
377  // ...with the suffixlen characters at the end of 'line'
378  const char *pattstring = e.literalstring.c_str();
379  const char *p = strstr(pattstring, line + suffixoffset);
380  if (!p)
381  return nullptr;
382  p += strlen(line + suffixoffset);
383  rest = p;
384  int pattsuffixlen = e.literalstring.size() - (p - pattstring);
385 
386  // pattern, if we cut off the 'rest', must exactly match 'line'
387  return doMatch(line, 0, pattsuffixlen) ? rest.c_str() : nullptr;
388 }

◆ setPattern()

void inet::PatternMatcher::setPattern ( const char *  pattern,
bool  dottedpath,
bool  fullstring,
bool  casesensitive 
)

Sets the pattern to be used by subsequent calls to matches().

See the general class description for the meaning of the rest of the arguments. Throws cException if the pattern is bogus.

33 {
34  pattern.clear();
35  iscasesensitive = casesensitive;
36 
37  // "tokenize" pattern
38  const char *s = patt;
39  while (*s != '\0') {
40  Elem e;
41  switch (*s) {
42  case '?':
43  e.type = dottedpath ? COMMONCHAR : ANYCHAR;
44  s++;
45  break;
46 
47  case '[':
48  if (pattern.empty() || pattern.back().type != LITERALSTRING || !parseNumRange(s, ']', e.fromnum, e.tonum))
50  else
51  e.type = NUMRANGE;
52  break;
53 
54  case '{':
55  if (parseNumRange(s, '}', e.fromnum, e.tonum)) {
56  e.type = NUMRANGE;
57  s++;
58  }
59  else
60  parseSet(s, e);
61  break;
62 
63  case '*':
64  if (*(s + 1) == '*') {
65  e.type = ANYSEQ;
66  s += 2;
67  }
68  else {
69  e.type = dottedpath ? COMMONSEQ : ANYSEQ;
70  s++;
71  }
72  break;
73 
74  default:
76  break;
77  }
78  pattern.push_back(e);
79  }
80 
81  if (!fullstring) {
82  // for substring match, we add "**" at both ends of the pattern (unless already there)
83  Elem e;
84  e.type = ANYSEQ;
85  if (pattern.empty() || pattern.back().type != ANYSEQ)
86  pattern.push_back(e);
87  if (pattern.front().type != ANYSEQ)
88  pattern.insert(pattern.begin(), e);
89  }
90  Elem e;
91  e.type = END;
92  pattern.push_back(e);
93 }

Referenced by PatternMatcher().

Member Data Documentation

◆ iscasesensitive

bool inet::PatternMatcher::iscasesensitive = false
private

◆ pattern

std::vector<Elem> inet::PatternMatcher::pattern
private

◆ rest

std::string inet::PatternMatcher::rest
private

Referenced by patternPrefixMatches().


The documentation for this class was generated from the following files:
inet::units::constants::c
const value< double, compose< units::m, pow< units::s, -1 > > > c(299792458)
inet::PatternMatcher::COMMONSEQ
@ COMMONSEQ
Definition: PatternMatcher.h:80
inet::PatternMatcher::LITERALSTRING
@ LITERALSTRING
Definition: PatternMatcher.h:73
inet::PatternMatcher::debugStr
std::string debugStr()
Returns the internal representation of the pattern as a string.
Definition: PatternMatcher.h:161
inet::PatternMatcher::ANYSEQ
@ ANYSEQ
Definition: PatternMatcher.h:79
inet::PatternMatcher::parseNumRange
void parseNumRange(const char *&s, Elem &e)
inet::units::constants::e
const value< double, units::C > e(1.602176487e-19)
inet::PatternMatcher::setPattern
void setPattern(const char *pattern, bool dottedpath, bool fullstring, bool casesensitive)
Sets the pattern to be used by subsequent calls to matches().
Definition: PatternMatcher.cc:32
inet::opp_toupper
char opp_toupper(unsigned char c)
Definition: PatternMatcher.cc:17
inet::PatternMatcher::doMatch
bool doMatch(const char *line, int patternpos, int suffixlen) const
Definition: PatternMatcher.cc:241
inet::PatternMatcher::NUMRANGE
@ NUMRANGE
Definition: PatternMatcher.h:78
inet::PatternMatcher::debugStrFrom
std::string debugStrFrom(int from)
Definition: PatternMatcher.cc:176
inet::units::values::s
value< double, units::s > s
Definition: Units.h:1235
inet::PatternMatcher::isInSet
bool isInSet(char c, const char *set) const
Definition: PatternMatcher.cc:228
inet::opp_isdigit
bool opp_isdigit(unsigned char c)
Definition: PatternMatcher.cc:16
inet::PatternMatcher::pattern
std::vector< Elem > pattern
Definition: PatternMatcher.h:91
inet::PatternMatcher::ANYCHAR
@ ANYCHAR
Definition: PatternMatcher.h:74
inet::PatternMatcher::rest
std::string rest
Definition: PatternMatcher.h:94
inet::PatternMatcher::iscasesensitive
bool iscasesensitive
Definition: PatternMatcher.h:92
inet::physicallayer::k
const double k
Definition: Qam1024Modulation.cc:14
inet::PatternMatcher::parseLiteralString
void parseLiteralString(const char *&s, Elem &e)
Definition: PatternMatcher.cc:131
inet::PatternMatcher::COMMONCHAR
@ COMMONCHAR
Definition: PatternMatcher.h:75
inet::PatternMatcher::parseSet
void parseSet(const char *&s, Elem &e)
Definition: PatternMatcher.cc:95
inet::PatternMatcher::SET
@ SET
Definition: PatternMatcher.h:76
up
removed DscpReq Ipv4ControlInfo up
Definition: IUdp-gates.txt:14
inet::PatternMatcher::NEGSET
@ NEGSET
Definition: PatternMatcher.h:77
inet::PatternMatcher::END
@ END
Definition: PatternMatcher.h:81