308-420A, Assignment 1

Copyright ©2008  T. H. Merrett

COMP 617	Information Systems		Winter 2008		Week 7

				Series Queries

1. Two motivating queries from [DiaoImmGyll:SASE+]

 Query 2 captures a one-hour period in which the price of a particular stock
 increased from 10 to 20 and the trading volume of the stock stayed relatively
 stable.

 Query 3 captures a more complex trend for each stock: in the last hour, the
 volume started high, but after a period when the price increased or remained
 relatively stable, the volume plummeted.

 To approach these queries, we need some preliminaries.

2. Running automata.

 Here's the autmaton from [week1p1].

	aababb		Str(sq ch)	DFA( sS ch eS )	term( eS)
			     1  a	      1  a 125	     136
	(a|b)*(abb|aa*b)     2  a	      1  b   1	      14
			     3  b	    125  a 125
			     4  a	    125  b 136
			     5  b	    136  a 125
			     6  b	    136  b  14
					     14  a 125
			 		     14  b   1

 Here's a programmer-written funop [week2p1].

	comp makeAutom(DFA, term, start, runauto) is
	{ state automSt intg;
	  comp runauto(value,accum) is funop
	  { automSt <- DFA[automSt,value];
	    accum <- [] in term[automSt]
	  }
	  automSt <- start
	}

 (Note		automSt <- DFA[automSt,value];
 is syntactic shorthand for
 		automSt <-  [pick eS] in DFA[automSt,value];
 The type system can probably convert from singleton unary to integer.
 NB [pick eS] is also shorthand for the less direct [red max of eS].)

 Let's run this on Str.
	makeAutom(in DFA, in term, in 1, out runauto);
	let res be fun runauto of ch order sq;
	let match be red max of res;	<< NB false < true   OR  red or >>
	let matchlast be red max of (sq = red max of sq) and res;

		Str(sq ch) res match matchlast
		     1  a   f    t       t
		     2  a   f    t       t
		     3  b   t    t       t
		     4  a   f    t       t
		     5  b   t    t       t
		     6  b   t    t       t

3. Automata on predicates. Let's look at a STOCK relation and derive some
 predicates related to queries 2 and 3 above.
	let firstime be red min of time;
	let firstvol be red max of if time = firstime then volume else -1;
	let volstable be 0.9*firstvol <= volume <= 1.1*firstvol;
	let prevprice be if time = firstime then price - 1 else
	    fun pred of price order time;
	let priceincr be price >= prevprice;

	STOCK
	(symbol time price volume) volstable priceincr allVSorPI
	 GOOG  1000   10   1100        t         t         t
	 GOOG  1006   12   1400        f         t         t
	 GOOG  1012   10   1200        t         f         t

 If we want to know whether the volume was stable or the price increased in
 the period covered by this relation, we can find
	let allVSorPI be red and of (volstable or priceincr)
 which is true.

 But if we want to detect a pattern such as tft or ttf, we need automata.

	autTFT		termTFT			autTTF		termTTF
	(sS pr eS)	(S)			(sS pr eS)	(S)
	  0  t  1	 3			  0  t  1	 3
	  0  f  0				  0  f  0
	  1  t  1				  1  t  2
	  1  f  2				  1  f  0
	  2  t  3				  2  t  2
	  2  f  2				  2  f  3
	  3  t  3				  3  t  3
	  3  f  3				  3  f  3

	makeAutom(in autTFT, in termTFT, in 0, out runauto);
	let vsTFT be fun runauto of volstable order time;
	let allvsTFT be red max of vsTFT;
	makeAutom(in autTTF, in termTTF, in 0, out runauto);
	let piTTF be fun runauto of priceincr order time;
	let allpiTTF be red max of piTTF;

 And if we want to detect the combined pattern, we could combine the automata:
 the two predicates combine to form the integers 0,..,3.

	let automIN be 2*(if volstable then 1 else 0) +
			 (if priceincr then 1 else 0);

		autCombined			termCombined
		(sS pr eS)	(sS pr eS)	(S)
		  0  0  0	  2  0  0	 3
		  0  1  0	  2  1  0
		  0  2  0	  2  2  3
		  0  3  1	  2  3  1
		  1  0  0	  3  0  3
		  1  1  2	  3  1  3
		  1  2  0	  3  2  3
		  1  3  1	  3  3  3

	makeAutom(in autCombined, in termCombined, in 0, out runauto);
	let combined be fun runauto of automIN order time;
	let allCombined be red max of (time = red max of time) and combined;

 Of course, this is redundant for this example, because  allCombined  should
 be identical to  allvsTFT and allpiTTF . But if we had more than three tuples
 in STOCK, then  allvsTFT  would be true if tft occurred _anywhere_ under
 volstable, and  allpiTTF   would be true if ttf occurred _anywhere_ under
 priceincr ,  and we would need to write a combined automaton for the
 synchronized occurrence of the patterns.

 We'll need an automaton for Query 3.

4. Windowing.

 Let's do some preparation for queries 2 and 3, first considering the whole of
 relation STOCKS. (The stocks are GOOGle and, later, BALLard.)
	let firstime be red min of time;
	let firstvol be red max of if time = firstime then volume else -1;
	let volstable be 0.9*firstvol <= volume <= 1.1*firstvol;
	let firstprice be red max of if time = firstime then price else -1;
	let pristable be 0.9*firstprice <= price <= 1.1*firstprice;
	let prevprice be if time = firstime then price - 1 else
	    fun pred of price order time;
	let priceincr be price >= prevprice;
	let volplunge be volume <= 0.8*(fun pred of volume order time);

 Here's the effect on STOCK. (For the moment, look only at the first column
 under each of the four virtual attributes shown.)

	STOCK
	(symbol time price volume) volstable pristable priceincr volplunge vpp
	 GOOG  1000   10   1100    t         t         t         f          f
	 GOOG  1006   12   1400    f t       f t       t t       f f        f
	 GOOG  1012   10   1200    t f t     t f t     f f t     f f f      t
	 GOOG  1018   13   1182    t f t     f t f     t t t     f f f      f
	 GOOG  1024   15   1200    t f t     f f f     t t t     f f f      f
	 GOOG  1030   16   1300    f t t     f f f     t t t     f f f      f
	 GOOG  1036   11   1100    t f t     t f t     f f f     f f f      f
	 GOOG  1042   18   1090    t f t     f f f     t t t     f f f      f
	 GOOG  1048   19   1080    t f t     f f f     t t t     f f f      f
	 GOOG  1054   20   1070    t f t     f f f     t t t     f f f      f
	 GOOG  1100   21   1000    t f f     f f f     t t t     f f f      f
	 GOOG  1106   21    800    f   f     f   f     t   t     t   t      f
	 GOOG  1112   20    870    f         f         f         f          f
	 GOOG  1118   21   1000    t         f         t         f          f
	 GOOG  1124   23   1070    t         f         t         f          f
	 GOOG  1130   24   1070    t         f         t         f          f
	 GOOG  1036   20   1070    t         f         t         f          f
	 GOOG  1142   24    870    f         f         f         f          f
	 GOOG  1148   26    800    f         f         f         f          f

 We can find out if the volume is stable and the price is stable or increasing
 over the whole relation:
	red and of (volstable and (pristable or priceincr))
 This is false (as is  red and of volstable  by itself, and
 red and of (pristable or priceincr)  by itself).

 But we don't want to know if these hold over the whole relation, but only over
 hour-long windows. We need a windowing reduction. We replace  red  above by a
 new operator,  redwin(),  which can have a parameter saying how big the window
 is. For the moment, let's use an integer count, redwin(10).

	let vpp be redwin(10) and of (volstable and (pristable or priceincr))
		   order time;

 Note that redwin() must be controlled by an  order  clause (and it may also
 have a group  by  clause).

 Two windows are shown above in the second and third columns under volstable,
 pristable, priceincr and volplunge. The resulting attribute, vpp, is shown on
 the right, with the value for each window shown at the starting entry for the
 window.

 Note that redwin() wraps around cyclically, unless we write code to stop it.

 The idea behind redwin() is that each window is considered to be a separate
 relation, and any virtual attribute, especially aggregates from red, equiv,
 fun and par, mentioned in a redwin() statement is to be actualized in that
 separate relation. Redwin() returns a single value for every window, which
 is why it must be a  red .

5. Here's redwin() in a group-by context.
	let allPI be redwin(3) and of priceincr order time by symbol;
		STOCK
		(symbol time price volume) priceincr allPI
		  GOOG  1000   10   1100       t       f
		  GOOG  1006   12   1400       t       f
		  GOOG  1012   10   1100       f       t
		  GOOG  1018   13   1000       t       t
		  GOOG  1024   15   1200       t       f  << remember: cyclic >>
		  GOOG  1030   16   1300       t       f
		  BALL  1000    2    700       t       f
		  BALL  1006    3    800       t       f
		  BALL  1012    1    600       f       t
		  BALL  1018    1    600       t       f
		  BALL  1024    2    800       t       f  << remember: cyclic >>
		  BALL  1030    1    600       f       t

6. We could have a variable window size, determined by an attribute of the
 relation.

	let ct be redwin(wsize) + of 1 order seq;
	let tot be redwin(wsize) + of seq order seq;
		A(seq wsize) ct tot
		   1    3     3   6
		   2    3     3   9
		   3    4     4  13
		   4    2     2   9
		   5    2     2   6

7. In addition to the counting parameter for  redwin()  we can have a logical
 parameter which tells us when to stop.

	let firstime be red min of time;
	let stop be time >= firstime + 18;
	let allPI be redwin(stop) and of priceincr order time by symbol;

 Both count and stopping predicate may be passed to redmin() (distinguished by
 type), and the logical  or  is taken to determine the window size.

 The stopping predicate can be written so as to stop the calculations before
 the specified window size is reached.

	let vpp be redwin(10,not(volstable and (pristable or priceincr)))
		 and of (volstable and (pristable or priceincr)) order time;

 stops the window, and the  red and of  calculation, as soon as a false value
 of the predicate is reached.

 An implementation might even be able to use the stopping predicate to skip the
 window past the point of failure, since intermediate windows will give false
 results.

Post-lecture note (080412).
 Let's generalize this to _any_ predicate limiting the window range given by
 the size parameter. Thus in a window of size 5 with sequence number s running
 within the window from 1 to 5, the predicate  s>1 & s<5  or  2<=s & s<=4
 would limit the window to tuples with those sequence numbers. See queryHum.pdf,
 Note 9, for an example.

 For reasons motivated by that example, we can allow the window size parameter
 to be negative as well as positive, and just use the absolute value.

 As a pragma to help the implementation know when to start or stop, in the case
 of a window predicate-limited to single contiguous subset of the range given
 by the size parameter, we could provide side-effect predicates which toggle
 when their argument becomes true:  start  and  stop . Thus in the above example
 in the predicate  start(2=s) & stop(s=5),  start() would initially be false and
 stop() true, and as s progresses from 1 to 5 the predicate would be F,T,T,T,F.

8. Query 2 captures a one-hour period in which the price of a particular stock
 increased from 10 to 20 and the trading volume of the stock stayed relatively
 stable.

	let firstime be red min of time;
	let firstvol be red max of if time = firstime then volume else -1;
	let volstable be 0.9*firstvol <= volume <= 1.1*firstvol;
	let prevprice be if time = firstime then price - 1 else
	    fun pred of price order time;
	let priceincr be price >= prevprice;
	let price10 be red max of price = 10;
	let price20 be red max of price = 20;
	let query2 be redwin(time >= firstime + 60 or time < firstime)
		and of (time10 and time20 and volstable and priceincr)
		order time by symbol;

	STOCK
	(symbol time price volume) volstable priceincr query2
	 GOOG  1000   10   1100    t         t           f
	 GOOG  1006   12   1400    f         t         
	 GOOG  1012   10   1200    t         f         
	 GOOG  1018   13   1182    t         t         
	 GOOG  1024   15   1200    t         t         
	 GOOG  1030   16   1300    f         t         
	 GOOG  1036   11   1100    t t       f t         f
	 GOOG  1042   18   1090    t t       t t       
	 GOOG  1048   19   1080    t t       t t       
	 GOOG  1054   20   1070    t t       t t       
	 GOOG  1100   21   1000      t         t       
	 GOOG  1106   21    800      f         t       
	 GOOG  1112   20    870      f t       f t       f
	 GOOG  1118   21   1000      t f       t t     
	 GOOG  1124   23   1070      t f       t t     
	 GOOG  1130   24   1070      t f       t t     
	 GOOG  1036   20   1070        f         t     
	 GOOG  1142   24    870        t         f     
	 GOOG  1148   26    800        t t       f t     f

 Note that  redwin( .. or time < firstime  prevents cycling back.

9. Query 3 captures a more complex trend for each stock: in the last hour, the
 volume started high, but after a period when the price increased or remained
 relatively stable, the volume plummeted.

 We need an automaton to ensure that the volume plummets _after_ the stable
 increase.

	let firstime be red min of time;
	let firstvol be red max of if time = firstime then volume else -1;
	let hivol be red max of firstvol > 1000;
	let firstprice be red max of if time = firstime then price else -1;
	let pristable be 0.9*firstprice <= price <= 1.1*firstprice;
	let prevprice be if time = firstime then price - 1 else
	    fun pred of price order time;
	let priceincr be price >= prevprice;
	let priceOK be pristable or priceincr;
	let volplunge be volume <= 0.8*(fun pred of volume order time);
	let automIn be 2*(if priceOK then 1 else 0) +if volplunge then 1 else 0;
	relation automQ3(sS,in,eS) <- {(0,0,0),(0,1,0),(0,2,1),(0,3,0),
	       (1,0,0),(1,1,0),(1,2,1),(1,3,2),(2,0,2),(2,1,2),(2,2,2),(2,3,2)};
	relation termQ3(eS) <- {(2)};
	makeAutom(in automQ3, in termQ3, in 0, out runauto);
	let automCheck be fun runauto of automIn order time;
	let query3 be redwin(time >= firstime + 60  or time < firstime)
	    and of (hivol and red max of automCheck)
	    order time by symbol;

	STOCK
	(symbol time price volume) priceOK volplunge query3
	 GOOG  1000   10   1100    t       f           f
	 GOOG  1006   12   1400    t t     f f         f
	 GOOG  1012   10   1200    t f t   f f f       t
	 GOOG  1018   13   1182    t t t   f f f       t
	 GOOG  1024   15   1200    t t t   f f f       t
	 GOOG  1030   16   1300    t t t   f f f       t
	 GOOG  1036   11   1100    t f t   f f f       t
	 GOOG  1042   18   1090    t t t   f f f       t
	 GOOG  1048   19   1080    t t t   f f f       t
	 GOOG  1054   20   1070    t t t   f f f       t
	 GOOG  1100   21   1000      t t     f f       t
	 GOOG  1106   21    800        t       t       f
	 GOOG  1112   20    870                        f
	 GOOG  1118   21   1000                        f           
	 GOOG  1124   23   1070                        f
	 GOOG  1130   24   1070                        f
	 GOOG  1036   20   1070                        f
	 GOOG  1142   24    870                        f
	 GOOG  1148   26    800                        f


10. I have added only one new construct, redwin(), in all the above. (Running
 automata follows directly from the programmer-definable funop of [week2p1],
 and this is an expected capability, given that Aldat supports funops at all.)
 Note that redwin() is possible only because of virtual attributes: it is
 conceptually as easy to think of an attribute being actualized on a window as
 on a whole relation. Without virtual attributes, all sorts of constructs
 would be needed as we see in [DiaoImmGyll:SASE+].

 But redwin() was not the first construct I came up with. On my way there I
 first produced a much more elaborate and less powerful set of operators. I'd
 like to discuss these even though they are replaced by redwin(), in order to
 show some of the discovery process. Also, the obsolete facilities might offer
 a way to implement redwin().

	let w be winsize(win) 3 order seq;
		<< shorthand below: 3,4,0 is {(3),(4),(0)} >>
	let base be fun + of 1 order seq;
	let linear_w be where win